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1. FIELD OF THE INVENTION 
The present invention relates to a novel approach to 
drug discovery. More particularly, the invention relates to 
a system for preserving the genomes of organisms that are 
good or promising sources of drugs; for randomly combining 
genetic materials from one or more species of organisms to 
generate novel metabolic pathways; and for pre-screening or 
screening such genetically engineered cells for the 
generation of novel biochemical pathways and the production 
of novel classes of compounds. The novel or reconstituted 
metabolic pathways can have utility in commercial production 
of the compounds. 

2. BACKGROUND OF THE INVENTION 
2.1. SOURCES OF DRUG LEADS 

The basic challenges in drug discovery are to identify a 
lead compound with the desirable activity, and to optimize 
the lead compound to meet the criteria required to proceed 
with further drug development. One common approach to drug 
discovery involves presenting macromolecules implicated in 
causing a disease (disease targets) in bioassays in which 
potential drug candidates are tested for therapeutic 
activity. Such molecules could be receptors, enzymes or 
transcription factors. 

Another approach involves presenting whole cells or 
organisms that are representative of the causative agent of 
the disease. Such agents include bacteria and tumor cell 
lines . 
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Traditionally, there are two sources of potential drug 
candidates, collections of natural products and synthetic 
chemicals. Identification of lead compounds has been 
s achieved by random screening of such collections which 
encompass as broad a range of structural types as possible. 
The recent development of synthetic combinatorial chemical 
libraries will further increase the number and variety of 
compounds available for screening. However, the diversity in 
any synthetic chemical library is limited to human 
imagination and skills of synthesis. 

Random screening of natural products from sources such 
as terrestrial bacteria, fungi, invertebrates and plants has 
resulted in the discovery of many important drugs (Franco et 
al. 1991, Critical Rev Biotechnol 11:193-276; Goodfellow et 
al. 1989, in "Microbial Products: New Approaches", Cambridge 
^ University Press, pp. 343-383; Berdy 1974, Adv Appl Microbiol 
18:309-406; Suffness et al. 1988, in Biomedical Importance of 
Marine Organisms, D.G. Fautin, California Academy of 
Sciences, pages 151-157). More than 10,000 of these natural 
products are biologically active and at least 100 of these 
are currently in use as antibiotics, agrochemicals and anti- 
2o cancer agents. The success of this approach of drug 
discovery depends heavily on how many compounds enter a 
screening program. Typically, pharmaceutical companies 
screen compound collections containing hundreds of thousands 
of natural and synthetic compounds. However, the ratio of 
novel to previously-discovered compounds has diminished with ' 
2s time, m screens for anti-cancer agents, for example, most 
of the microbial species which are biologically active may 
yield compounds that are already characterized. Partly, this 
is due to the difficulties of consistently and adeguately 
finding, reproducing and supplying novel natural product 
samples, since biological diversity is largely due to 
3Q underlying molecular diversity, there is insufficient 

biological diversity in the organisms currently selected for 
random screening, which reduces the probability that novel 
compounds will be isolated. 
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Novel bioactivity has consistently been found in various 
natural sources. See for example, Cragg et al., 1994. (in 
"Enthnobotany and the search for new drugs" Wiley, 
s Chichester. pl78-196) . Few of these sources have been 

explored systematically and thoroughly for novel drug leads. 
For example, it has been estimated that only 5000 plant 
species have been studied exhaustively for possible medical 
use. This is a minor fraction of the estimated total of 
250,000-3,000,000 species, most of which grow in the tropics 
(Abelson 1990, Science 247:513). Moreover, out of the 
estimated millions of species of marine microorganisms, only 
a small number have been characterized. Indeed, there is 
tremendous biodiversity that remains untapped as sources of 
lead compounds. 

Terrestrial microorganisms, fungi, invertebrates and 
plants have historically been used as sources of natural 
products. However, apart from several well-studied groups of 
organisms, such as the actinomycetBs , which have been 
developed for drug screening and commercial production, 
reproducibility and production problems still exist. For 
example, the antitumor agent, taxol, is a constituent of the 
2Q bark of mature Pacific yew trees, and its supply as a 

clinical agent has caused concern about damage to the local 
ecological system. Taxol contains 11 chiral centers with 
2048 possible diastereoisomeric forms so that its de novo 
synthesis on a commercial scale seems unlikely (Phillipson, 
1994, Trans Royal Soc Trop Med Hyg 88 Supp 1:17-19). 
25 Marine invertebrates are a promising source of novel 

compounds but there exist major weaknesses in the technology 
for conducting drug screens and large-scale resupply. For 
instance, marine invertebrates can be difficult to recollect, 
and many have seasonal variability in natural product 
content. 

30 Marine microorganisms are a promising source of novel 

compounds but there also exist major weaknesses in the 
technology for conducting drug screens and industrial 



- 3 - 



WO 00/52180 



PCT/US00/05707 



fermentation with marine microorganisms. For instance, 
marine microorganisms are difficult to collect, establish and 
maintain in culture, and many have specialized nutrient 
5 requirements. A reliable source of unpolluted seawater is 
generally essential for fermentation, it is estimated that 
at least 99% of marine bacteria species do not survive on 
laboratory media. Furthermore, available commercial 
fermentation equipment is not optimal for use in saline 
conditions, or under high pressure. 
iQ Furthermore, certain compounds appear in nature only 

when specific organisms interact with each other and the 
environment. Pathogens may alter plant gene expression and 
trigger synthesis of compounds, such as phytoalexins , that 
enable the plant to resist attack. For example, the wild 
tobacco plant Nicotiana sylvestris increases its synthesis of 
15 alkaloids when under attack from larvae of Manduca sexta. 
Likewise fungi can respond to phytoalexins by detoxification 
or preventing their accumulation. Such metabolites will be 
missed by traditional high-throughput screens, which do not 
evaluate a fungus together with its plant host. A dramatic 
example of the influence of the natural environment on an 
20 organism is seen with the poison dart frog, while a lethal 
dose of the sodium channel agonist alkaloid, batrachotoxin, 
can be harvested by rubbing the tip of a blow dart across the 
glandular back of a field specimen, batrachotoxin could not 
be detected in second generation terrarium-reared frogs 
(Daly, 1995, Proc. Natl. Acad. Sci. 92:9-13). If only 
25 traditional drug screening technologies are applied, 

potentially valuable molecules such as these may never be 
discovered . 

Moreover, a lead compound discovered through random 
screening rarely becomes a drug, since its potency, 
selectivity, bioavailability or stability may not be 
30 adequate. Typically, a certain quantity of the lead compound 
is required so that it can be modified structurally to 
improve its initial activity. However, current methods for 
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synthesis and development of lead compounds from natural 
sources, especially plants, are relatively inefficient. 
There are significant obstacles associated with various 
stages of drug development, such as recollection, growth of 
the drug-producing organism, dereplication, strain 
improvement, media improvement, and scale-up production. 
These problems delay clinical testing of new compounds and 
affect the economics of using these new sources of drug 
leads, 

10 At present ' the above-mentioned marine, botanical and 

animal sources of natural products are underused. The 
currently available methods for producing and screening lead 
compounds cannot be applied efficiently to these under- 
explored sources. Unlike some terrestrial bacteria and 
fungi, these drug-producing organisms are not readily 

^ amenable to industrial fermentation technologies. 

Simultaneously, the pressure for finding novel sources for 
drugs is intensified by new high-efficiency and high- 
throughput screening technologies. Therefore, there is a 
general need for methods of harnessing the genetic resources 
and chemical diversity of these as yet untapped sources of 

^ compounds for the purpose of drug discovery. 

2.2. EXPRESSION LIBRARIES 
Most recently drug discovery programs have shifted to 
mechanism-based discovery screens. Once a molecular target 
is identified (e.g., a hormone receptor involved in 
25 re 9 ula ting the disease) , assays are designed to identify 
and/or synthesize therapeutic agents that interact at a 
molecular level with the target. 

Gene expression libraries are used to identify, 
investigate and produce the target molecules. Expression 
cloning has become a conventional method for obtaining the 
30 tar 9 e t gene encoding a single protein without knowing the 
protein's physical properties. 
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Many proteins identified by screening gene expression 
libraries prepared from human and mammalian tissues are 
potential disease targets, e.g., receptors (Simonsen et al. 
5 1994, Trends Pharmacol Sci 15:437-441; Nakayama et al. 1992 
Curr Opin Biotechnol 3:497-505; Aruffo, 1991, Curr Opin 
Biotechnol, 2:735-741), and signal-transducing proteins 
(Margolis et al., us 5,434,064). See Seed et al., 1987, Proc 
Natl Acad Sci 84:3365-3369; Yamasaki et al., 1988, Science 
241:825-828; and Lin et al., 1992, Cell 68:775-785, (type III 
10 TGF-p receptor) for examples of proteins identified by 
functional expression cloning in mammalian cells. 

Once a disease target is identified, the protein target 
or engineered host cells that express the protein target have 
been used in biological assays to screen for lead compounds 
(Luyten et al. 1993, Trends Biotechnol 11:247-54). Thus 
1S within the scheme of drug discovery, the use of gene 
expression libraries has been largely limited to the 
identification and production of potential protein disease 
targets. Only in those instances where the drug is a protein 
or small peptide, e.g., antibodies, have expression libraries 
been prepared in order to generate and screen for molecules 
20 having the desirable biological activity (Huse et al. 1991, 
Ciba Foundation Symp 159:91-102). 

However, there are other applications of gene expression 
libraries that are relevant to drug discovery. Gene 
libraries of microorganisms have been prepared for the 
purpose of identifying genes involved in biosynthetic 
25 pathways that produce medicinally-active metabolites and 
specialty chemicals. These pathways require multiple 
proteins (specifically, enzymes), entailing greater 
complexity than the single proteins used as drug targets. 
For example, genes encoding pathways of bacterial polyketide 
synthases (PKSs) were identified by screening gene libraries 
30 of the organism (Malpartida et al. 1984, Nature 309:462- 
Donadio et al. 1991, Science 252:675-679). PKSs catalyze 
multiple steps of the biosynthesis of polyketides, an 



- 6 - 



WO 00/52180 



PCI7USOO/05707 



important class of therapeutic compounds, and control the 
structural diversity of the polyketides produced. A host- 
vector system in Streptomyces has been developed that allows 
5 directed mutation and expression of cloned PKS genes 
(McDaniel et al. 1993, Science 262:1546-1550; Kao et al. 
1994, Science 265:509-512). This specific host-vector system 
has been used to develop more efficient ways of producing 
polyketides, and to rationally develop novel polyketides 
(Khosla et al., WO 95/08548). 
10 Another example is the production of the textile dye, 

indigo, by fermentation in an £. coli host. Two operons 
containing the genes that encode the multienzyme biosynthetic 
pathway have been genetically manipulated to improve 
production of indigo by the foreign E. coli host. (Ens ley et 
al. 1983, Science 222:167-169; Murdock et al. 1993, 
15 Bio/Technology 11:381-386). Overall, conventional studies of 
heterologous expression of genes encoding a metabolic pathway 
involve directed cloning, seguence analysis, designed 
mutations, and rearrangement of specific genes that encode 
proteins known to be involved in previously characterized 
metabolic pathways. 
20 in view of numerous advances in the understanding of 

disease mechanisms and identification of drug targets, there 
is an increasing need for innovative strategies and methods 
for rapidly identifying lead compounds and channeling them 
toward clinical testing. 



25 3. SUMMARY OF THE TNVENTION 

The present invention provides a drug discovery system 
for generating and screening molecular diversity for the 
purpose of drug discovery. The method of the invention 
captures and preserves in combinatorial gene expression 
libraries the genetic material of organisms that are known/or 
30 prospective sources of drug leads. 

In one embodiment, the invention involves the 
construction of combinatorial natural pathway gene expression 
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libraries from one or more species of donor organisms 
including microbes, plants and animals, especially those that 
cannot be recovered in substantial amounts in nature, or be 
^ cultured in the laboratory. The donor organisms in the pool 
may be selected on the basis of their known biological 
properties, or they may be a mixture of known and/or 
unidentified species of organisms collected from nature. 
Random fragments of the genomes of donor organisms, some of 
which contain entire biochemical pathways or portions 
thereof, are cloned and expressed in the host organisms. 

According to the invention, a subset of the gene 
products of the cloned DNA are capable of functioning in the 
host organism. The naturally-occurring pathways of the donor 
organisms may thus be reconstituted in the host organism. 
The expression of donor genes in the dissimilar physiological 
i5 and regulatory environment of a heterologous host can unmask 
otherwise silent metabolic pathways. The metabolic pathways 
of the donor organism may also interact with metabolic 
pathways resident in the host organism to generate novel 
compounds or compounds not normally produced by the host 
organism. 

^ Moreover, because only a defined subset of donor 

organism genes is expressed in the host organism at any one 
time, the system can render metabolic pathways and compounds 
easier to detect against an already characterized 
biochemical/cellular background of the host organism. 
Essentially, the genetic resources of these donor organisms 
are captured and preserved in the gene expression libraries 
which can be replicated and used repeatedly in different drug 
discovery programs. 

In another embodiment, the invention involves the 
construction of combinatorial chimeric pathway expression 
libraries in which genetic material derived from one or more 
species of donor organism is randomly combined, cloned, and 
expressed in the host organism. Such libraries generate 
random combinations of genes from multiple pathways and 
organisms, which gives rise to metabolic pathways and 
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discrete gene sets previously non-existent in nature. The 
term "discrete gene set" refers to any assemblage of two or 
more genes obtained from the ligation of genes from one or 
s more pathway or organism in a combinatorial gene expression 
library. The plurality of gene products are capable of 
functioning in the host organism, where they interact to form 
novel chimeric metabolic pathways that produce novel classes 
of compounds. Thus, the diversity of molecular structures 
available for drug screening is increased by mixing the 
genetic material of the extant pathways and organisms in the 
combinatorial chimeric gene expression library. 

In yet another embodiment, the invention involves biased 
combinatorial expression libraries wherein the donor genetic 
material in the libraries are preselected, and may not 
contain the entire genome of the donor organisms. The 
invention also provides mobilizable combinatorial expression 
libraries in which the cloned donor genetic material can be 
transferred from one or more species of host organism to at 
least one other species of host organism. 

In yet another embodiment, the invention provides 
recombined combinatorial gene expression library wherein a 
2o portion of the genetic material derived from a plurality of 
species of donor organisms have been subjected to homologous 
or homeologous recombination. The process of homologous or 
homeologous recombination can be carried out in a 
recombination-permissive cell, or in vitro in a reaction 
comprising the appropriate enzymes and cof actors. This 
approach allows the random exchange of genetic seguences 
among nucleic acid molecules which share seguence 
similarities. The process results in DNA containing random 
combinations of structurally and functionally-related genes 
from multiple pathways and organisms, which can give rise to 
metabolic pathways and discrete gene sets previously non- 
existent in nature. The recombined genetic material can be 
used to prepare combinatorial gene expression libraries, 
including non-mobilizable and mobilizable combinatorial gene 
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5 



expression libraries, and biased combinatorial gene 
expression libraries. 

While standard methods of screening gene expression 
libraries can be used, the libraries can be further modified 
to incorporate a reporter regimen tailored to identify clones 
that are expressing the desirable pathways and metabolic 
products, in a specific embodiment, the host organisms are 
engineered to include a gene encoding a reporter protein 
operatively associated with a chemoresponsive promoter that 
io responds to the desirable class of metabolites to be detected 
in the expression library. 

In an alternative embodiment, the host organism may be 
exposed to a physiological probe which is a precursor of a 
reporter molecule that is converted directly or indirectly to 
the reporter molecule by a compound produced in the pathway 
sought. Activation of expression of the reporter or 
conversion of a reporter precursor produces a signal that 
allows for identification and isolation of the desirable 
clones . 

In yet another embodiment of the invention, the host 
organisms in the library may be embedded in a semi-solid 
2o matrix with a reporter regimen or another indicator cell type 
that contains an assay or is itself a target for the 
desirable compound, e.g., pathogens for anti-infectives, or 
cancer cells for antitumor agents. High-throughput screening 
processes can be used, e.g., macrodroplet sorting, 
fluorescence activated cell sorting or magnetic activated 
25 cell sorting, to identify and isolate the desired organisms 
in a combinatorial gene expression library. 

The positive clones may be further analyzed for the 
production of novel compounds. The genetics and biochemistry 
of the metabolic pathway that lead to production of the novel 
compounds may be delineated by characterizing the genetic 
30 material that was introduced into the isolated clones. 

The present invention also relates to recombinant DNA 
vectors useful for constructing combinatorial gene expression 
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libraries, specific combinatorial gene expression libraries, 
host organisms containing a particular type of reporter 
system, host organisms modified for facilitating production 
_ of otherwise toxic compounds, and compositions comprising 
host organisms, indicator cells and/or a reporter regimen. 

3.1. DEFINITIONS 
As used herein, the following terms will have the 
meanings indicated. 

A "combinatorial natural pathway expression library" is 
a library of expression constructs prepared from genetic 
material derived from a plurality of species of donor 
organisms, in which genes present in the genetic material are 
operably associated with regulatory regions that drive 
expression of the genes in an appropriate host organism. The 
combinatorial expression library utilizes host organisms that 
are capable of producing functional gene products of the 
donor organisms. The genetic material in each of the host 
organism encodes naturally-occurring biochemical pathways or 
portions thereof from one of the donor organisms. 

A "combinatorial chimeric pathway expression library" is 
a library of expression constructs prepared from randomly 
concatenated genetic material derived from one or more 
species of donor organisms, in which genes present in the 
genetic material are operably associated with regulatory 
regions that drive expression of the genes in an appropriate 
host organism. The host organisms used are capable of 
producing functional gene products of the donor organisms. 

A "biased combinatorial gene expression library" is a 
library of expression constructs prepared from genetic 
material derived from one or more species of donor organisms, 
which has been preselected for a specific property. The 
preselected genetic material can be used to prepare 
combinatorial natural pathway or chimeric libraries. 

A "mobilizable combinatorial gene expression library" is 
a library of expression constructs prepared from genetic 
material derived from one or more species of donor organisms, 
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and cloned in a shuttle vector that enables the transfer of 
the donor genetic materials from one or more species or 
strain of donor organism to at least one other species or 
^ strain of host organism. A shuttle vector can be used to 
prepare combinatorial natural pathway or chimeric libraries. 

A "recombined combinatorial gene expression library" is 
a library of expression constructs prepared from genetic 
material derived from a plurality of species of donor 
organisms, wherein a portion of said genetic materials have 
undergone homologous or homeologous recombination. The 
recombined genetic materials are used to prepare recombined 
combinatorial gene expression libraries, including 
mobilizable combinatorial gene expression libraries, or 
biased combinatorial gene expression libraries. 

As used herein, the term "library" refers to expression 
i5 constructs or host organisms containing the expression 
constructs . 

The terms "biochemical pathway", "natural pathway" and 
"metabolic pathway" encompass any series of related 
biochemical reactions that are carried out by an organism. 
Such pathways may include but are not limited to biosynthetic 
2Q or biodegradative pathways, or pathways of energy generation 
or conversion. 

A "compound" is any molecule that is the result or by- 
product of a biochemical pathway, and is usually the product 
of interactions of a plurality of gene products. 

An "activity" is the capability of a host organism to 
carry out a biochemical reaction or a series of biochemical 
reactions leading to the production of a compound of 
interest . 

As used in the present invention, the following 
abbreviations will apply: eq (equivalents) ; M (Molar) ; mM 
(millimolar) ; M M (micromolar) ; N (Normal); mol (moles); mmol 
(millimoles) ; jnnol (micromoles) ; nmol (nanomoles) ; kg 
(kilograms) ; gm (grams) ; mg (milligrams) ; M g (micrograms) ; ng 
(nanograms); L (liters); mL (milliliters) ; M l (microliters); 
vol (volumes) ; s (seconds) ; and °C (degrees Centigrade) . 
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In addition, the following abbreviations are used: 
Cfu: colony forming units; LB: Luria Broth; ddH 2 0: double- 
distilled, reversed osmosis purified water; sea H 2 0: Filtered 
Pacific seawater; SSW: synthetic seawater; FACS: 
fluorescence-activated cell sorting; GFP: Aeguorea victoria 
green fluorescent protein; kbp: Kilobase pairs; g: Gravity; 
rpm: Rotations per minute; CIAP: Calf intestinal alkaline 
phosphatase; EDTA: Ethylenediamine tetraacetic acid; TE: lOmM 
Tris/1.5 mM EDTA pH 7.4; PEG: Polyethylene glycol; E. coli: 
10 Escherichia coli; CHO: Chinese hamster ovary; S. cerevisiae: 
Saccharomyces cerevisiae; A. nidulans: Aspergillus nidulans; 
S. pombe: Schizosaccharomyces pombe; S. lividans: 
Streptomyces lividans; S. aureus: Staphylococcus aureus; S. 
coelicolor: Streptomyces coelicolor; B. subtilis: Bacillus 
1S subtilis; BAC: Bacterial artificial chromosome; YAC: yeast 
artificial chromosome; PCR: polymerase chain reaction; CaMV: 
cauliflower mosaic virus; AcNPV: autographa calif ornica 
nuclear polyhydrosis virus; EBV: Epstein-Barr virus; SDS: 
sodium dodecyl sulfate; CsCl: cesium chloride. 

20 

4- DESCRIPT ION OF THE FIGURES 

Figure 1: Expression construct for combinatorial 
natural pathway expression library. The expression construct 
contains vector DNA and a donor DNA fragment that comprises 
genes encoding a metabolic pathway and natively associated 
2 5 regulatory regions. 

Figure 2: Expression construct for combinatorial 
chimeric pathway expression library. The expression 
construct contains vector DNA and five concatenated gene 
cassettes each comprising donor DNA and regulatory region. 
Figure 3: A cloning strategy for combinatorial natural 
30 pathway expression library. Clonable DNA (B) is extracted 
from donor organisms (A) is partially digested with a 
restriction enzyme to generate fragments of genomic DNA (C) 
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encoding naturally-occurring biochemical pathways or portions 
thereof, a DNA vector (D) digested with a restriction enzyme 
to generate a vector having compatible ends (E) is ligated to 
s the fragments of genomoic DNA to form expression constructs 
(F) . 

Figures 4A-4C: Assembly of a gene cassette. Figure 4A 
depicts an annealed, phosphorylated lac promoter fragment 
containing a cohesive BaroHl site and a blunt end 
corresponding to a portion of a Smal site. Figure 4B depicts 
10 a promoter dimer containing a BamHI site flanked on each side 
by a lac promoter. Figure 4C depicts concatenated promoter 
fragments . 

Figures 5A-5G: Cloning strategy for combinatorial 
chimeric pathway expression library. Figure 5A shows the 
is steps in preparing promoter and terminator fragments for 
directional cloning of cDNA and genomic DNA inserts. Figure 
5B shows the steps in preparing promoter and terminator 
fragments for ligation to genomic DNA inserts. Figure 5c 
shows the steps in preparing cDNA inserts for directional 
cloning, assembly of gene cassettes, and attachment to solid 
2Q support. Figure 5D shows the steps in preparing genomic DNA 
inserts for cloning, assembly of gene cassettes and 
attachment to solid support. Figures 5E to 5G show the 
serial ligation and deprotection of gene cassettes to form a 
concatemer, the ligation of the concatemer to an s. pombe/E. 
coli shuttle vector (pDblet) , release of the expression 
25 construct from the solid support and circularization of the 
expression construct. 

Figures 6A-6B: Vectors useful for preparing 
combinatorial gene expression libraries. Figure 6A shows a 
map of Streptocos. The cosmid vector Streptocos contains a 
unique BamHI site flanked by T3 and T7 promoters in the 
30 multiple cloning site, the origin of replication and 

thiostrepton resistance gene from plj 699, a ColEl origin 
(ori), an ampicillin gene (Amp) and two cos sites. Figure 6B 
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shows a map of modified pDblet. The plasmid pDblet is 
modified in the multiple cloning site (MCS) , and contains a 
ColEl origin of replication, an ampicillin gene (Ap R ) , two 
5 copies of autonomous replicating sequence (ARS) , an ura4 
marker, and the 3-galactosidease gene (LacZ) . A: Aatll; B: 
BamHI N: Ndel. Figure 6C shows the oligomer containing an 
altered BstXI sequence and a tfcol site, which was ligated in 
excess to Sacl/Notl cut pDblet to form modified pDblet. 

FigUre 7 Sh ° WS 9 cnemores P° nsive construct pERD-20-GFP 
comprising a reporter gene encoding green fluorescent protein 
(GFP) , a chemoresponsive promoter (Pm) and its associated 
regulator (XylS) . 

Figure 8 shows a macrodroplet comprising a permeable 
matrix, in which is encapsulated a clone from a combinatorial 
gene expression library, and an indicator cell which contains 
X5 a reporter regimen. 

Figures 9A and 9B provides an example of FACS sorting of 
a pool of E. coli cells, with and without the presence of 
expression constructs comprising marine bacterial genes. E. 
coli, strain XL1-MR containing the chemoresponsive construct 
20 pERD-20-GFP, referred to as XL1-GFP was infected with a 

cosmid library of marine bacterial genes. The XLl-GFP cells 
with or without the marine bacteria genes were cultured for 
12 hours at 30°C, and subjected to two cycles of FACS 
sorting. Figure 9A: XLl-GFP with marine bacterial genes; 
Figure 9B: control XLl-GFP cells. 
25 Figure 10 shows an alignment of the amino acid sequence 

of actinorhodin dehydrase of Streptomyces coelicolor, and the 
predicted partial amino acid sequence derived from CXC-AMN20. 
Plain boxes indicate sequence identity, and shaded boxes 
indicate conservative sequence homology. 

Figure 11: PC R detection of clone CXC-AMN20 sequences in 
30 pools of genomic DNA of marine bacteria. The figure shows a 
stained agarose gel containing PCR amplicons derived from 
marine bacteria genomic DNA. M: molecular weight markers, 
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sizes in bp. -: negative control. +: positive controls for 
the amplicon and for ribosomal RNA. The lanes contain 
amplicons derived from T: genomic DNA from all 37 species of 
s marine bacteria; 1, 2, 3, 4: pools of genomic DNA of marine 
bacteria . 

Figure 12A-C. PGR detection of clone CXC-AMN20 seguences 
in genomic DNA of marine bacteria species. The figures show 
stained agarose gels containing PGR amplicons derived from 
genomic DNA of individual species of marine bacteria, m- 
io molecular weight markers, sizes in bp. - : negative control. 
+: positive controls for the amplicon and for ribosomal RNA. 
The lanes contain amplicons derived from genomic DNA of 
marine bacteria: species #1-10, #12-20 and #21-35 in pool 1, 
2 and 3 respectively. 

Figure 13. pPCos+ura. The figure shows the key 
is elements of the 9.6 kb S. pombe/E. coli cosmid vector 

PPCos+ura : multiple cloning site (MCS) , yeast selection 
, marker (ura4) , cos sites for packaging in X phage (cos), SV40 
origin of replication (SV40 ori) , neomycin resistance gene 
(Neo R ) , colEi origin of replication (ColEl ori) , ampicillin 
resistance gene (Amp R ) and s. pombe autonomously replicating 
20 seguence (ARS) . 

Figure 14. pPCosl. The figure shows the key elements 
of the 9.8 kb sr. pombe/E. coli cosmid vector pPCosl : 
multiple cloning site (MCS) , yeast selection marker (ura4) , 
cos sites for packaging in X phage (cos) , SV40 origin of 
replication (SV40 ori), ColEl origin of replication (ColEl 
25 ori) , ampicillin resistance gene (Amp R ) and S. pombe 
autonomously replicating seguence (ARS). 

5 - DETAILED DESCRIPTION OF THE THVUiHTTfiw 

The present invention relates to a drug discovery system 
that provides methods and compositions for capturing and 
preserving the diversity of genetic resources in nature, and 
for translating and expanding the captured genetic resources 



- 16 - 



WO 00/52180 



PCT7US00/057O7 



5 



into diversity of chemical structures. The invention also 
facilitates screening for desirable activities and compounds. 
More particularly, the invention provides methods for 
5 constructing and screening combinatorial gene expression 
libraries. These libraries comprise random assortments of 
gene products of multiple species which are in some cases 
allowed to interact with each other in the expression host, 
and result in some cases in the formation of novel 
biochemical pathways and/or the production of novel classes 
^ of compounds. Moreover, the libraries of the invention 

provide efficient access to otherwise inaccessible sources of 
molecular diversity. Some of the libraries of the invention 
can be transferred from one species of host organism to 
another species or strain of host organism. 

The novel biochemical pathways may carry out processes 
including but not limited to structural modification of a 
substance, addition of chemical groups to the substance, or 
decomposition of the substance. 

The novel classes of compound may include but are not 
limited to metabolites, secondary metabolites, enzymes, or 
structural components of an organism. A compound of interest 
may have one or more potential therapeutic properties, 
including but not limited to antibiotic, antiviral, 
antitumor, pharmacological or immunomodulating properties or 
be other commercially-valuable chemicals such as pigments. A 
compound may serve as an agonist or an antagonist to a class 
of receptor or a particular receptor. 

As used in the present invention, the term 
"combinatorial gene expression library" encompasses 
combinatorial natural pathway expression library, 
combinatorial chimeric pathway expression library as well as 
host organisms containing the libraries of expression 
constructs . 

A "combinatorial natural pathway expression library" is 
a library of expression constructs prepared from genetic 
material derived from one or more species of donor organisms, 
in which genes present in the genetic material are operably 
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associated with regulatory regions that drives expression of 
the genes in an appropriate host organism. The combinatorial 
expression library utilizes host organisms that are capable 
s of producing functional gene products of the donor organisms. 
The genetic material in each of the host organism encodes 
naturally-occurring biochemical pathways or portions thereof 
from one of the donor organisms. 

A "combinatorial chimeric pathway expression library" is 
a library of expression constructs prepared from randomly 
concatenated genetic material derived from a plurality of 
species of donor organisms, in which genes present in the 
genetic material are operably associated with regulatory 
regions that drives expression of the genes in an appropriate 
host organism. The host organisms used are capable of 
producing functional gene products of the donor organisms. 
is Upon expression in the hbst organism, gene products of the 
donor organism (s) may interact to form novel chimeric 
biochemical pathways. 

Generally, the methods of the invention comprise 
providing genetic material derived from one or more donor 
organism (s), manipulating said genetic material, and 
2o introducing said genetic material into a host organism via a 
cloning or expression vector so that one or more genes of the 
donor organism (s) are cloned and expressed in the host 
organism. Such host organisms containing donor genetic 
material are pooled to form a library. For some libraries, 
the genetic material can be transferred from one species of 
25 host organism to another species or strain of host organism, 
in which the genetic material can be stably maintained and 
expressed. Before cloning, the genetic material can be 
preselected for a specific property, and/or subjected to 
homologous or homeologous recombination to generate diversity 
in nucleotide sequences and organization of genes within or 
^ among one or more transcriptional unit or operon. Depending 
on the starting material, choice of host organisms and 
vectors, any recombinant DNA techniques known in the art can 
be used in combination and in any order with the techniques 
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and protocols described in section 5. 3 , 5.4, 5.5, and those 
used in Examples 6 and 7. 

The cloned genetic material, typically comprises a 
s random assortment of genes, the expression of which is driven 
and controlled by one or more functional regulatory regions. 
The expression construct or vector may provide some of these 
regulatory regions. The genes of the donor organism(s) are 
transcribed, translated and processed in the host organism to 
produce functional proteins that in turn generate the 
10 metabolites of interest. 

According to the present invention, gene expression 
libraries comprising complete naturally occurring biochemical 
pathways or substantial portions thereof can greatly 
facilitate searches for donor multi-enzyme systems 
responsible for making compounds or providing activities of 
is interest. Genes that are involved in a particular 
biochemical pathway can be conveniently isolated and 
characterized in a single expression construct or clone, a 
typical arrangement of such an expression construct is shown 
in Figure l. 

Once a desirable activity or compound is identified 
2Q this convenient feature can greatly facilitate downstream 
drug development efforts, such as strain improvement and 
process development. The positive clone can be cultured 
under standard conditions to produce the desired compound in 
substantial amounts for further studies or uses. The genes 
of the biochemical pathway are immediately available for 
2s sequencing, mutation, expression, and further rounds of 
screening. The cloned biochemical pathway is readily 
amenable to traditional and/or genetic manipulations for 
overproduction of the desired compound. 

Furthermore, biochemical pathways that are otherwise 
silent or undetectable in the donor organism may be 
3Q discovered more easily by virtue of their functional 

reconstitution in the host organism, since the biochemical 
characteristics of the host organism are well known, many 
deviations as a result of expression of donor genetic 
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material can readily be recognized. Novel compounds may be 
detected by comparing extracts of a host organism containing 
donor genetic material against a profile of compounds known 
s to be produced by the control host organism under a given set 
of environmental conditions. Even very low levels of a 
desirable activity or compound may be detected when the host 
biochemical and cellular background of the host organism is 
well characterized. As described in later sections, the 
present invention provides methods for detecting and 
^ isolating clones that produce the desirable activity or class 
of compounds. 

In a preferred embodiment, the methods may be applied to 
donor organism (s) that cannot be recovered in substantial 
amounts in nature, or cultured in the laboratory. By cloning 
genetic material from such organisms into a host organism, 
is the organisms' metabolic pathways can be reproduced, and ' 
their products tested efficiently for any desirable 
properties. Thus, the genetic diversity of these organisms 
is captured and preserved. 

In another embodiment of the invention, a combinatorial 
chimeric pathway gene expression library can be constructed 
2Q in which the genetic materials from one or multiple donor 
organisms are randomly concatenated prior to introduction 
into the host organism. Thus, each host organism in the 
library may individually contain a unique, random combination 
of genes derived from the various donor pathways or 
organisms. Figure 2 shows the arrangement of genes and 
regulatory regions in an expression construct of a 
combinatorial chimeric pathway gene expression library. For 
the most part, such combinations of genes in the library do 
not occur in nature. Upon expression, the functional gene 
products of the various donor pathways or organisms interact 
with each other in individual host organisms to generate 
combinations of biochemical reactions which result in novel 
chimeric metabolic pathways and/or production of novel 
compounds. Collectively, the genetic resources of the donor 
organisms in the library are translated into a diversity of 
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chemical compounds that may not be found in individual donor 
organisms . 

in another aspect of the invention, the species of donor 
s organisms may be selected on the basis of their biological 
characteristics, or ability to carry out desirable but 
uncharacterized biochemical reactions that are complementary 
to the host organism. Such desirable characteristics may 
include, but are not limited to the capability to utilize 
certain nutrients, to survive under extreme conditions, to 
io derivatize a chemical structure, and the ability to break 
down or catalyze formation of certain types of chemical 
.linkages, when genes of the donor organism are expressed in 
the host organism, the donor gene products can modify and/or 
substitute the functions of host gene products that 
constitute host metabolic pathways, thereby generating novel 
is hybrid pathways. Novel activities and/or compounds may be 
produced by hybrid pathways comprising donor and host-derived 
components. The target metabolic pathway modified by donor 
gene products may be native to the host organism. 
Alternatively, the target metabolic pathway may be provided 
by products of heterologous genes which are endogenous or 
2q have been genetically engineered into every host organism 
prior to or contemporaneous to construction of the gene 
expression library. Thus, the present invention also 
embodies constructing and screening gene expression 
libraries, wherein DNA fragments encoding metabolic pathway 
of donor organisms are cloned and coexpressed in host 
as organisms containing a target metabolic pathway. 

In another embodiment of the invention, the host 
organism may have an enhanced complement of active drug 
efflux systems which secretes the compounds of interest into 
the culture medium, thus reducing the toxicity of the 
compounds to the host organism. Absorptive material, e.g., 
30 neutral resins, may be used during culturing of the host 
organisms, whereby metabolites produced and secreted by the 
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host organism may be sequestered, thus facilitating recovery 
of the metabolites. 

In order to make the process of screening combinatorial 
s gene expression libraries more efficient, the present 

invention further provides methods for detecting those host 
organisms in the library that possess the activity or 
compound of interest. In one embodiment of the invention, 
the host organism contains a reporter system that will 
respond to the presence of an introduced change, such as the 
iq presence of the desirable compound or activity, by activating 
the de novo synthesis of a reporter molecule, in another 
embodiment, the host organism contains the precursor of a 
reporter molecule, or a physiological probe, which is 
converted to the reporter by the presence of the desirable 
compound or activity. The reporter molecule in the positive 
15 clone generates a signal which allows detection of the 
positive clone in the expression library, as well as its 
isolation from the other non-productive clones. 

In many respects, the drug discovery system provides 
significant convenience and time advantage to the various 
steps of drug development up to clinical trials. The 
2Q libraries of the invention are compatible with the 

established multi-well footprint format and robotics for 
high-throughput screening. The host organisms of the 
invention are organisms commonly used for genetic 
manipulation and/or process development. The present 
invention takes advantage of the fact that such host 
25 organisms or production hosts are well-characterized in terms 
of their biological properties and maintenance requirements. 
By cloning genetic materials from a donor organism in other 
more familiar expression systems, the need for difficult 
culturing conditions for the donor organism is reduced. 
Thus, the biological activities, the pharmacokinetic and 
30 toxic properties of any lead compound discovered in the 
system of the invention may be studied and optimized more 
efficiently. 
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The novel metabolic pathway generated in a positive 
clone can be delineated by standard techniques in molecular 
biology. The lead compound may be synthesized by culturing a 
clone of the drug-producing host organism under standard or 
empirically determined culture conditions, so that sufficient 
quantities of the lead compound may be isolated for further 
analysis and development. There are already high purity 
manufacturing protocols, such as Good Manufacturing Practice 
(GMP) established for some of these standard industrial host 
organisms. Unlike conventional methods of screening natural 
product sources, less effort is required to adapt the 
screening and production technologies to the particular 
requirements of each potential drug-producing organism. 

Moreover, once a postiive clone has been identified in a 
screening assay, the sequences in the clone encoding the 
metabolic pathway or portions thereof can be isolated and 
used as hybridization probes, other gene libraries or 
combinatorial gene expression libraries of the same donor 
organism, or related organisms may be screened with such 
probes to isolate related genes in the natural pathaway or 
20 ° ther 9 enes in the same operon or combinatorial gene 
expression consturct. such sequences may also be 
reintroduced into and coexpressed in a host organism for 
making gene expression libraries. The process of screening 
and expression can be repeated to further increase the 
genetic and chemical diversity in the combinatorial gene 
expression libraries. 

25 

The present invention also provides specific 
combinatorial gene expression libraries made according to the 
methods of the invention from genetic materials of a 
particular set of donor organisms and/or cell types. Not all 
organisms or cell types in a set, especially mixed samples, 
3Q need to be individually identified or characterized to enable 
preparation of the combinatorial gene expression libraries. 

The invention further provides archival or mobilizable 
combinatorial gene expression libraries in which the genetic 
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material of the donor organisms can be transferred from one 
speeds of host organism to another species or strain of host 
organism. These libraries are particularly useful when 
5 genetic material of the donor organism is unique, rare or 
difficult to prepare, or when it is desirable to obtain 
expression of the donor genetic material in many different 
species or strains of host organism, where the cloning 
vector contains the appropriate origin (s) of replication and 
selection mechanism (s) , and/or origin (s) of transfer, the 
io genetic material in the library can be transferred, and be 
stably maintained and expressed in other species or strains 
of host organism. The transfer can be effected by, for 
example, isolation of the expression constructs and 
introducing the constructs into another host organism, by any 
means, such as but not limited to transformation, 
^ transfection, infection or electroporation. Alternatively 
the transfer can be effected by bacterial conjugation between 
appropriate host organisms. 

The invention further provides recombined combinatorial 
gene expression libraries in which the genetic materials of a 
plurality of species of organisms are manipulated by 
2Q homologous or homeologous recombination before their use in 
constructing the libraries. The process of homologous or 
homeologous recombination can be carried out in vivo within a 
live cell, or in vitro in a reaction containing cell extracts 
and/or isolated recombination enzymes. By facilitating 
random exchanges of DNA segments of different origins 
25 selectively in regions where there are sequence similarities 
the resulting pool of DNA comprises recombined genes that 
encode products with altered and/or novel properties, and 
gene clusters comprising a novel repertoire of genes 
Optionally, the starting genetic materials can be preselected 
for a specific property before recombination; the preselected 
3Q DNA may display sequence similarities to nucleotide sequences 
that encode proteins that participate in a metabolic pathway 
of interest. These recombined combinatorial gene expression 
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libraries are highly efficient in generating diversity within 
a gene cluster that encodes functionally-related proteins 
that form a part of a metabolic pathway of interest. Exchange 
s of DNA facilitated by homologous or homeologous recombination 
may preserve the translation reading frames of large genes 
and reduce the chance of complete disruption of the 
interactions of multifunctional and/or multimeric 
biosynthetic enzymes. Generally, the number of clones that 
needs to be screened for a biological activity of interest 
io may be smaller than those required for other libraries, such 
as those generated by random DNA fragmentation and PCR-based 
assembly (Stemmer, 1994, Nature 370: :389-9l) . These 
recombined combinatorial gene expression libraries can be 
screened by traditional methods and methods provided by the 
invention . 

is Any combinatorial gene expression library of the 

~" invention may be amplified, replicated, and stored. 

Amplification refers to culturing the initial host organisms 
containing donor DNA so that multiple clones of the host 
organisms are produced. Replication refers to picking and 
growing of individual clones in the library, a combinatorial 
2q gene expression library of the invention may be stored and 
retrieved by any techniques known in the art that is 
appropriate for the host organism. Thus, the libraries of 
the invention are an effective means of capturing and 
preserving the genetic resources of donor organisms, which 
may be accessed repeatedly in a drug discovery program. 
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5.1. PREPARATION OF COMBINATORIAL GENE 
EXPRESST ON LIBPAPTFR 



5.1.1. DONOR ORGANISMS 
Any organism can be a donor organism for the purpose of 
preparing a combinatorial gene expression library of the 
30 invention. The donor organisms may be obtained from private 
or public laboratory cultures, or culture deposits, such as 
the American Type Culture Collection, the International 
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Mycological Institute, or from environmental samples either 
cultivable or uncultivable. 

The donor organism (s) may have been a traditional source 
s of drug leads, such as terrestrial bacteria, fungi and 
plants. The donor organisms may be transgenic, genetically 
manipulated or genetically selected strains that have been 
useful in generating and/ or producing drugs. 

The donor organism (s) may or may not be cultivable with 
current state-of-the-art microbiological technigues e.g., the 
10 genetic material used to prepare the libraries can be 

obtained directly from an environmental sample, since only a 
minority 1%) of the microbes found in nature can be 
cultured in the laboratory, the major advantage of the 
present invention is that the donor organism does not have to 
be cultivable to be utilized herein (Torsvik et al. 1990, 
15 A PPl Env Micro, 56:782-787). 

The invention is not limited to the use of 
microorganisms as donors. Plants produce an enormous range 
of compounds, some with dramatic activities on both animals 
and microorganisms, for example, phytoalexins (Abelson 1990, 
Science 247:513). Some of these compounds are inducible by ' 
20 wounding or elicitors derived from the cell walls of plant 
pathogens (Cramer et al. 1985, EMBO J. 4:285-289; Cramer et 
al. 1985, Science 227:1240-1243; Dron et al. 1988, Proc. 
Natl. Acad. Sci. USA 85:6738-6742). Biologically-active 
compounds, like taxol, camptothecin, and artemisinin are 
examples of plant-derived natural products which are 
25 undergoing clinical development respectively as anti-tumor 
and anti-malarial agents. Any plants, especially those with 
potential medicinal properties, may be desirable donor 
organisms (Phillipson, 1994, Trans R soc Trop Med Hyg, 88 
Suppl l:si7-9; Chadwick et al. eds, 1994, in "Ethnobotany and 
the search for new drugs", Wiley, Chichester, Ciba Foundation 
30 Symp 185) . 

Another source of natural products with potentially 
useful antimicrobial or pharmacological properties are 
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invertebrates and vertebrates. Some of these compounds serve 
as chemical defenses against competitors, pathogens and 
predators. Such compounds may also be used to kill prey or 
used as a form of communication (Caporale 1995, Proc Natl 
Acad Sci 92:75-82). In numerous cases, the secondary 
metabolites are thought to be produced by associated microbes 
that may be symbiotic (Faulkner et al. 1993, Gazzetta Chimica 
Italiana, 123:301-307; Bewley et al. 1995, in "An Overview of 
Symbiosis in Marine Natural Products Chemistry Symposium" in 
honor of Professor Antonio Gonzalez, La Laguna University, 
Canary Islands, September 16, 1995, p26 (abstract)). 

Organisms known to manipulate biochemical pathways of 
other organisms in nature are sources of particular interest, 
e.g. certain plants, such as Cycas, can produce an ecdysone-' 
mimic which disrupts the development of certain insects. 
15 Such organisms may live in the same ecological niche where 
they exist as competitors, symbionts, predator and prey, or 
host and parasite. Thus, it may be advantageous to use 
genetic materials derived from organisms that interact 
chemically with others in nature. 

Yet another rich source of natural products is marine 
2Q organisms. For instance, marine microbes produce novel 
molecular structures, many of which are bioactive, e.g. 
octalactin A which is a potential anti-cancer agent with a 
molecular structure not previously seen in terrestrial 
bacteria (Tapiolas et al. 1991, J Amer Chem Soc, 113:4682- 
83); and salinamides (Trischman et al. 1994, J Amer Chem Soc 
25 116:757-758) which have potent anti-inflammatory properties. 
Certain compounds derived from marine microorganisms contain 
bromine from seawater which renders the compounds highly 
active because of the chemical reactivity of the incorporated 
halogen, e.g., marinone (Pathirana et al. 1992, Tetrahedron 
Lett 33:7663-7666), a product of mixed polyketide and 
30 mevalonic acid biosynthetic pathways, which has selective 
antibiotic activity against gram positive bacteria. There is 
a vast diversity of marine species which live in a range of 
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habitats, from polar to tropical regions, with different 
salinities, temperatures and pressures. The unique nature of 
these habitats is reflected in the distinct genetics and 
s biochemistry of these organisms, and may provide many useful 
drug leads. See, for example, Fenical et al. 1992, in 
"Marine Microorganisms; a new biological resource", Adv in 
Marine Biotechol, Vol. I, Plenum Press, New York. 

Environmental samples may be obtained from natural or 
man-made environments, and may contain a mixture of 
io prokaryotic and eukaryotic organisms, and viruses, some of 
which may be unidentified, samples can either be randomly 
collected or collected from areas that are ecologically 
stressed, for example, near an industrial effluent. Soil 
freshwater or seawater filtrates, deposits around hot springs 
or thermal vents, and marine or estuarine sediments may be 
xs used as sources of donor organisms. Samples may be collected 
from benthic, pelagic, and intertidal marine sources. 
Samples may be collected from tropical, subtropical, 
temperate and other regions. The donor organisms may be 
thermophilic, halophilic, acidophilic, barophilic, or 
methanogenic . 

zo It is also preferable to use organisms that are facing 

the possibility of extinction, such as those plants and 
microorganisms found in the tropical rain forest, insofar as 
such habitats are being destroyed, species are being lost 
that might yield useful medicines. 

Organisms with potential medicinal properties, including 

2s algae, lichens, fungi, plants, and animals, may also be 

collected on the basis of their uses in traditional or ethnic 
medical practices. 

In many aspects, it is desirable that the library is 
constructed with genetic material derived from donor 
organisms that are not generally amenable to traditional drug 
3q discovery or development technologies. Such donor organisms 
may have one or more of the following characteristics: 
(i) the organism cannot be propagated or cultured in the 
laboratory; (ii) the organism cannot be recovered from nature 
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in amounts sufficient for further experiments; and/or 
(iii) the organism requires special conditions for production 
of the desirable compound that are unknown or are not 
s commercially reasonable. The latter characteristics also 
describe organisms in extant culture collections, where no 
drug leads may have been detected in conventional screening 
processes due to inappropriate culture conditions. 

For the purpose of constructing an expression library, 
the donor organisms need not be taxonomically defined or 
io biochemically characterized. Identification or genetic 
footprinting of a cultivable species or a representative 
group of species from an environmental sample may be 
performed depending on the complexity of the sample and the 
needs of the drug discovery program, such as, for example, a 
requirement for donor species dereplication. 
is The donor organisms may be concentrated or cultured in 

the laboratory or field prior to extraction of their nucleic 
acids. For preparing cDNAs, specific growth conditions or 
the presence of certain chemicals in the culture may be 
required to induce or enhance the transcription of gene 
products encoding the activities of interest in the donor 
2q organisms. Standard growth conditions may be used to culture 
the organisms if only genomic DNA is required. 

Since it is unlikely that all donor organisms in an 
environmental sample may be propagated at the same rate, if 
at all under laboratory conditions, some of the donor 
organisms may overgrow and lead to the loss or dilution of 
2s slow-growing organisms. Thus, it may be preferable to 
prepare nucleic acids directly from donor organisms in an 
environmental sample without prior culturing in the 
laboratory. This may be especially useful when attempting to 
access the secondary metabolites of invertebrates such as 
marine sponges, where the metabolites are often believed to 
be produced by the associated symbiotic and uncultivable 
microbes. Methods for preparing high quality nucleic acids 
from donor organisms in environmental samples are provided 
below in Sections 5.1.2. 
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Donor organisms contemplated by the invention may 
include, but are not limited to viruses; bacteria; 
unicellular eukaryotes, such as yeasts and protozoans; algae; 
fungi; plants; tunicates; bryozoans; worms; echinoderms; 
insects; mollusks; fishes; amphibians; reptiles; birds; and 
mammals. Non-limiting examples of donor organisms are listed 
in Tables I and II. 
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Table I: 



List of exemplary bacterial and fungal donor organisms (Berdy 
5 1974, Adv Appl Microbiol, 18: 309-406; Goodfellow et ah 
1989 , in "Microbial Products: New Approaches", Cambridqe 
University Press 343-383) 



Group 



Bacteria 

Ac t inomyce tal es 



10 



Eubacteriales 



15 



Fungi 



Pseudomonadales 

Mycoplasma tal es 
Myxobacteriales 



20 



25 



Myxothallophytes 
Phycomyce tes 
Ascomycetes 
Basldiomycetes 
Fungi Imperfecta 



Yeasts 



Genera 



Streptomyces , Mi cromonospora, 
Norcadia, Actinomadura, 
Actinoplanes , 
Streptosporangium, 
Microtis pora, Kitasatosporia 
Azobacterium, Ehizobium, 
Achromoba cteri urn , 
Enterobacterium, Brucella, 
Micrococcus, Lactobacillus, 
Bacillus, Clostridium, 
Brevibacteri urn 
Pseudomonas, Aerobacter, 
Vibrio, Halobacterium 
Mycoplasma 

Cytophaga, Myxococcus 

Physarum, Fuligo 

Mucor, Phytophtora, Rhizopus 

Aspergillus, Penicillium 

Coprinus, Phanerochaete 

Acremonium (Cephalosporium) , 

Trochoderma, Helminthosporium, 

Fusarium, Alternaria, 

Myrothecium 

Saccharomyces 
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Table II: 



Group 



Higher forms of exemplary donor organisms 
Exemplar y Genera. Compounds & Properties 



Plants 

Algae 



10 



Lichens 
Higher Plants 



15 



Protozoa 

Dinoflagellates 

Insects 



20 



Bryozoans 

Molluscs 
Sponges 



Corals 

25 

Worms 

Annelida 
Spinunculida 
30 Tunicates 



Digenea simplex (kainic acid, 
antihelminthic) 

Laminar ia anqustata (laminine, 
hypotensive) 

Usnea fasciata (vulpinicacid, 
antimicrobial; usnic acid, 
antitumor) 

Catharanthus (Vinca alkaloids) , 
Digitalis (cardiac glycosides) , 
Podophyllum (podophyllotoxin) , 
Taxus (taxol), Cephalotaxus 
(homoharringtonine) , 
Camptotheca (Camptothecin) , 
Artemisia (artemisinin) , Coleus 
(f orskolin) , Desmodium (K 
channel agonist) 

Ptychodiscus brevis 
(brevi toxin, cardiovascular) 
Dolomedes ("fishing spider" 
venoms) , Epilachna (mexican 
bean beetle alkaloids) 
Bugruia neritina (bryostatins, 
anti cancer) 
Conus toxins 

Microciona prolifera (ectyonin, 
antimicrobial) Cryptotethya 
cryta (D-arabino furanosides) 
Pseudoterogonia species 
(Pseudoteracins , anti- 
inf lamma t or y ) Erythro podi urn 
(erythrolides, anti- 
inflammatory) 

Lumbriconereis heteropa 
( nereistoxin , insect icidal ) 
Bonellia viridis (bonellin, 
neuroactive) 

Trididemnum solidum (didemnin, 
anti-tumor and anti-viral) 
Ecteinascidia turbinate 
(ecteinascidins, anti-tumor) 
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Fish 



5 Amphibians 



Reptiles 
Birds 



10 



Mammals 



15 



Eptatretus stoutii (eptatretin, 
cardioactive), Trachinus draco 
(proteinaceous toxins, reduce 
blood pressure, respiration and 
reduce heart rate) 
Dendrobatid frogs 

(batrachotoxins, pumiliotoxins, 
histrionicotoxins, and other 
polyamines) 
Snake venom toxins 
histrionicotoxins, modified 
carotenoids, retinoids and 
steroids (Goodwin 1984 in "The 
Biochemistry of the 
Carotenoids" Vol. II, chapman 
and Hall, New York, pp. 160- 
168) 

Orinthorhynohus anatinus (duck- 
billed platypus venom), 
modified cantenoids, retinoids 
and steroids (Goodwin 1984, 
supra, pp. 173-185; Devlin 1982 
in "Textbook of Biochemistry" 
Wiley, New York, p. 750) 



20 



25 
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5.1.2. PREPARATION OF HIGH QUALITY NUCLEIC ACIDS 
FROM DONOR QRGAWTfiMfi 

Nucleic acids nay be isolated from donor organisms by a 
variety of methods depending on the type of organisms and the 
5 source of the sample, it is important to obtain high quality 
nucleic acids that are free of nicks, single stranded gaps, 
and partial denaturation, and are of high molecular weight 
(especially for genomic DNA cloning) , in order to construct 
gene expression libraries that are fully representative of 
the genetic information of donor organisms. To prepare high 
10 quality nucleic acid, the methods of the invention provide 
gentle, rapid and complete lysis of donor organisms in the 
sample, and rapid and complete inactivation of nucleases and 
other degradative proteins from the organisms, initial 
extraction may be carried out in the field to stabilize the 
nucleic acids in the sample until further isolation steps can 
15 be performed in the laboratory 

Any nucleic acid isolation procedure requires efficient 
breakage of the donor organism. A number of standard 
techniques may be used, including freezing in liquid 
nitrogen, grinding in the presence of glass or other 
disruptive agents, as well as simple mechanical shearing or 
20 enzymatic digestion. 

For mixed materials such as soil, or for samples that 
contain high amounts of tough materials, such as cellulose or 
chitin (as in filamentous fungi and plants, for instance), 
freeze-drying may be employed to render the samples fragile, 
thus making them more amenable to disruption. Such 
25 lyophilized materials preserve both enzymatic as well as high 
molecular weight materials (such as nucleic acids) for long 
periods (Gurney 1984, in Methods in Molecular Biology, Vol. 
2, P35-42, John M. Walker ed.). Samples may be flash frozen 
in liquid nitrogen. Samples that are loose, such as soil, 
can be frozen in fine gauze or nylon mesh. Lyophilization 
30 can be carried out on frozen samples under vacuum for a 
period of 24-72 hours. Freeze-dried samples can be stored 
desiccated under vacuum at -70°C. Additional steps may be 
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required for preparation of environmental samples, such as 
concentration of microbial populations (Jacobson et al. 1982 
Appl Env Microbiol, 58:2458-2462; Zhou et al. 1996, Appl Env' 
g Microbiol, 62:316-322; Somerville et al. 1989, Appl Env 
Microbiol, 55:548-554). 

One principal method of the present invention, though 
certainly not the only one to be used, is modified from 
Chirgwin et al. (1979, Biochem 24:5294), Sadler et al. (1992 
Curr Genet, 21:409-416) and Foster (1991, Ph.D. thesis, 
io University of California, Santa Barbara) . The method uses 
the strong chaotropic agent, guanidinium isothiocyanate, with 
2-mercaptoethanol to denature proteins and inactivate 
nucleases, followed by purification of the nucleic acid 
material by cesium chloride gradient centrifugation. The 
method provided herein differs from Chirgwin -s method in that 
is both dna and rna are extracted. Also included in the method 
of the invention is a high speed centrifugation step, and the 
optional addition of bisbenzimide dye. Depending on the 
donor organism used, additional steps may include, but are 
not limited to, treatment with hexadecylpyridinium chloride 
or cetyltrimethyl ammonium bromide (CTAB) to selectively 
2o remove polysaccharides, treatment with polyvinylpyrrolidone 
for removal of phenolics, and cellulose chromatography for 
removal of starch and other carbohydrates (Murray & Thompson, 
1980, Nuc Acid Res 8: 4321-25) . 

RNA isolated from donor organisms can be converted into 
complementary DNA (cDNA) using reverse transcriptase. 
25 Damaged nucleic acid may be difficult to clone resulting 

xn loss of donor organism DNA and low numbers of clones in a 
library. The problem can be worsened if the host organism is 
permissive for recombination and lacks effective endogenous 
DNA repair mechanisms. The present invention also provides 
that damaged DNA can be repaired in vitro prior to cloning, 
30 using enzymatic reactions commonly employed during second 
strand synthesis of complementary DNA (Sambrook et al. 1989 
in -Molecular Cloning" 2nd Edition, section 8) . For example 
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DNA gaps and nicks may be repaired by the Klenow fragment of 
DNA polymerase, and E. coli DNA ligase. Such enzymatic 
reactions are well known to those skilled in the arts. 
5 When preparing a combinatorial expression library from 

DNA extracted from environmental samples, the quantity of 
available DNA is often limited, and is a consideration in the 
selection of ligation strategy, if the quantity is low after 
extraction or concatenation (<loo M g) , the DNA may be ligated 
into a high-efficiency cloning system e.g., SuperCos, as 
10 described in Section 5.1.3. The inserts in the clones are 
amplified and are released from the vector by restriction 
enzyme digestion. Due to the nature of environmental DNA 
samples, which may contain both prokaryotic and eukaryotic 
donor organisms, it may be desirable to use multiple host 
organisms, if sufficient amount of original environmental 
15 DNA sample is available, or if the DNA has been amplified, 
the DNA may be ligated to each of a panel of vectors 
appropriate for the desired panel of expression host cells. 
Preferably, the vectors have the capacity to shuttle between 
two or more expression hosts. 



20 5.1.3. HOST ORG ANISMS AMD VECTOR S 

The term "host organism" as used herein broadly 
encompasses unicellular organisms, such as bacteria, and 
multicellular organisms, such as plants and animals. Any 
cell type may be used, including those that have been 
cultured in vitro or genetically engineered. Any host-vector 
25 systems known in the art may be used in the present 

invention. The use of shuttle vectors that can be replicated 
and maintained in more than one host organism is 
advantageous . 

Host organisms or host cells may be obtained from 
private laboratory deposits, public culture collections such 
30 as the American Type Culture Collection, or from commercial 
suppliers, such host organisms or cells may be further 
modified by techniques known in the art for specific uses. 
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According to the invention, it is preferable that the 
host organism or host cell has been used for expression of 
heterologous genes, and are reasonably well characterized 
g biochemically, physiologically, and/or genetically, such 
host organisms may have been used with traditional genetic 
strain improvement methods, breeding methods, fermentation 
processes, and/or recombinant DNA techniques. It is 
desirable to use host organisms which have been developed for 
large-scale production processes, and that conditions for 
io growth and for production of secondary metabolites are known. 
The host organisms may be cultured under standard 
conditions of temperature, incubation time, optical density 
and media composition corresponding to the nutritional and ' 
Physiological requirements of the expression host. However 
conditions for maintenance and production of a library may be 
is different from those for expression and screening of the 
library. Modified culture conditions and media may also be 
used to emulate some nutritional and physiological features 
of the donor organisms, and to facilitate production of 
interesting metabolites. For example, chemical precursors of 
interesting compounds may be provided in the nutritional 
2o media to facilitate modifications of those precursors. Any 
techniques known in the art may be applied to establish the 
optimal conditions. 

The host organism should -preferably be deficient in the 
abilities to undergo homologous recombination and to restrict 
foreign DNA. The host organism should preferably have a 
2s codon usage similar to that of the donor organism, if 
eukaryotic donor organisms are used, it is preferable that 
the host organism has the ability to process the donor 
messenger RNA properly, e.g., splice out introns . 

Preferred prokaryotic host organisms may include but are 
not limited to Escherichia coli. Bacillus subtilis, 
30 Streptomyces lividans, Streptomyces coelicolor Pseudomonas 
aeruginosa, Myxococcus xanthus. Yeast species such as 
Saccharomyces cerevisiae (baker's yeast), Schizosaccharomyces 
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pombe (fission yeast) , Pichia pastoris, and Hansenula 
polymorpha (methylotropic yeasts) may also be used. 
Filamentous ascomycetes, such as Neurospora crassa and 
5 Aspergillus nitulans may also be used. Plant cells such as 
those derived from Nicotiana and Arabidopsis are preferred. 
Preferred mammalian host cells include but are not limited to 
those derived from humans, monkeys and rodents, such as 
Chinese hamster ovary (CHO) cells, NIH/3T3, COS, 293, VERO 
io etc (see Kriegler M. in "Gene Transfer and Expression: A 
Laboratory Manual", New York, Freeman & Co. 1990). 

A host organism may be chosen which modifies and 
processes the expressed gene products in a specific fashion 
as desirable. Such modifications (e.g., glycosylation) and 
processing (e.g., cleavage) of protein products may be 
15 important for the function of the protein in a biochemical 
pathway. Different host cells have characteristic and 
specific mechanisms for the post-translational processing and 
modification of proteins. Appropriate cells lines or host 
systems can be chosen to ensure the correct modification and 
processing of the foreign protein expressed. To this end 
20 eukaryotic host cells which possess the cellular machinery 
for proper and accurate processing of the primary transcript 
glycosylation, and phosphorylation of the gene product may be 
preferred if the donor organism(s) are eukaryotic. 

For example, it has been shown that eukaryotic fungi 
share much of the same core molecular biology, and that gene 
25 exchange is possible between many of the most common fungal 
specxes (Gurr et al. 1987, in Gene Structure in Eukaryotic 
Microbes, Kinghorn ed., p.93; Bennet & Lasure 1992, Gene 
Manipulations in Fungi, Academic Press, NY), a preferred 
example of a eukaryotic host organism is the fission yeast 
Schizosacoharomyces pombe. First, the molecular biology of 
30 S. pombe is highly developed and many major culture and 
purification processes and manipulations are routinely 
performed. Second, .it is unicellular, and thus can easily be 
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cultured, stored, and manipulated in a laboratory setting. 
Third, and of particular importance for use in expressing 
mixed eukaryotic DNAs, it is capable of properly splicing and 
s expressing genes of other species of fungi, plants, and 
mammals. Studies of the splicing and processing of 
heteronuclear RNA (RNA which contains introns) have indicated 
that S. pombe shares with other fungi and higher metazoans a 
remarkable similarity of pattern and structure of small 
nuclear RNA (snRNA) components needed for splicing. Finally, 
10 many non-5, pombe promoters, some of which derive from 

mammalian and plant viruses, are capable of driving moderate 
to high levels of gene expression (Forsburg, 1993, Nuc Acids 
Res, 21:2955) This feature can allow the shuttling of a 
fungal DNA/cDNA library to mammalian cell expression hosts 
such as NIH3T3 (fibroblasts) , GT1-7 (neuronal) , or other cell 
15 types. 

A cloning vector or expression vector may be used to 
introduce donor DNA into a host organism for expression. An 
expression construct is an expression vector containing donor 
DNA sequences operably associated with one or more regulatory 
regions. The regulatory regions may be supplied by the donor 
2o DNA or the vector. A variety of vectors may be used which 
include, but are not limited to, plasmids; cosmids; 
phagemids; artificial chromosomes, such as yeast artificial 
chromosomes (YACs) , and bacterial artificial chromosomes 
(BACs, Shizuya et al. 1992, Pro Natl Acad Sci 89: 8794-8797) 
or modified viruses, but the vector must be compatible with 
25 the host organism. Non-limiting examples of useful vectors 
are Agtli, P WE15, SuperCosl (Stratagene) , pDblet (Brun et al. 
1995, Gene, 164:173-177), pBluescript (Stratagene), CDM8 , 
PJB8, pYAC3, P YAC4 (see Appendix 5 of Current Protocols in 
Molecular Biology, 1988, Ed. Ausubel et al., Greene Publish. 
Assoc. & Wiley Interscience , which is incorporated herein by 
30 reference) . An exemplary cosmid vector, pPCos+ura, for 
cloning and expression in Schizosaccharomyces pombe is 
provided in Figure 13, and is deposited on October 24, 1996 
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at the Agricultural Research Service Culture Collection 
(NRRL) at Agricultural Research Center, U.S. Department of 
Agriculture, 1815 North University Street, Peoria, Illinois 
5 61604, United States, and is given accession number B-21637N. 
When the regulatory regions and transcription factors of 
the host and donor organisms are compatible, donor 
transcriptional regions will be able to bind host factors, 
such as RNA polymerase, to effect transcription in the host 
organism, if the donor and host organisms are not 
compatible, regulatory regions compatible to the host 
organism may be attached to the donor DNA fragment in order 
to ensure expression of the cloned genes. 

In cases where the entire operon, including its own 
translation initiation codon, ribosome binding regions, and 
adjacent sequences, is inserted into the appropriate cloning 
is or expression vector, no additional control signals may be 
needed. However, in cases where only a portion of the coding 
sequence of a gene is inserted, exogenous control signals, 
including the translation initiation codon (frequently ATG) 
and adjacent sequences, must be provided. These exogenous 
regulatory regions and initiation codons can be of a variety 
2o of origins, both natural and synthetic. Both constitutive 
and inducible regulatory regions may be used for expression 
of the donor DNA. It is desirable to use inducible promoters 
when the products of the expression library may be toxic. The 
efficiency of the expression may be enhanced by the inclusion 
of appropriate transcription enhancer elements, (see Bittner 
et al. 1987, Methods in En2ymol. 153:516-544). 

"Operably-associated" refers to an association in which 
the regulatory regions and the DNA sequence to be expressed 
are joined and positioned in such a way as to permit 
transcription, and ultimately, translation. The precise 
nature of the regulatory regions needed for gene expression 
may vary from organism to organism. Generally, a promoter is 
required which is capable of binding RNA polymerase and 
promoting the transcription of an operably-associated nucleic 
acid sequence. Such regulatory regions may include those 5»- 
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non-coding sequences involved with initiation of 
transcription and translation, such as the TATA box, capping 
sequence, CAAT sequence, and the like. The non-coding region 
_ 3 1 to the coding sequence may also be retained or replicated 
for its transcriptional termination regulatory sequences, 
such as terminators and polyadenylation sites. Two sequences 
of a nucleic acid molecule are said to be "operably- 
associated" when they are associated with each other in a 
manner which either permits both sequences to be transcribed 
^ onto the same RNA transcript, or permits an RNA transcript, 
begun in one sequence to be extended into the second 
sequence, A polycistronic transcript may thus be produced. 
Two or more sequences, such as a promoter and any other 
nucleic acid sequences are operably-associated if 
transcription commencing in the promoter will produce an RNA 
transcript of the operably-associated sequences. In order to 
be "operably-associated" it is not necessary that two 
sequences be immediately adjacent to one another. 

In addition, the expression vector may contain 
selectable or screenable marker genes for initially 
isolating, identifying or tracking host organisms that 
contain donor DNA. Any antibiotic resistance genes, such as 
but not limited to ampicillin, kanamycin, chloramphenicol, 
apramycin or gentamycin (Brau et al., 1984, Mol Gen Genet 
193:179-187) and hygromycin (Hopwood et al., 1985, Genetic 
Manipulation of Streptomyces , A Laboratory Uanual, The John 
Innes Foundation, UK) can be used. Universal forward 
selection based on plasmid stability in a bacterial host, 
such as the parD/E system (Johnson et al., 1996, J Bacteriol, 
178:1420-1429), can also be used, in the absence of 
antibiotic selection. 

The expression vector may also provide unique or 
conveniently located restriction sites to allow severing 
and/ or rearranging portions of the DNA inserts in an 
expression construct. 

The expression vector may contain sequences that permit 
maintenance and/or replication of the vector in one or more 
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10 



host organism, or integration of the vector into the host 
chromosome. Such sequences may include but are not limited 
to replication origins, autonomously replicating sequences 
(ARS) , centromere DNA, and telomere DNA. it may also be 
advantageous to include in the expression vector, host 
organism sequences or homologous sequences, especially those 
that are actively transcribed in the host. Such sequences 
may facilitate integration of the expression construct into 
the host chromosome, especially when they are found in 
positions flanking the cloning site in the cloning vector 
The expression construct may be integrated in the host genome 
or remain episomal in the host organism. As a result, one or 
more copies of an expression construct may be generated and 
maintained in a host organism. 

Generally, it may be advantageous to use shuttle vectors 
i5 which can be replicated and maintained in at least two host 
organisms, such as, for example, bacteria and mammalian 
cells, bacteria and yeasts, bacteria and plant cells, or gram 
positive and gram negative bacteria. A shuttle vector of the 
invention is capable of replicating in different species or 
strains of host organisms, and may contain one or more 
20 ° rigins of re Plication that determine the range of host 

organism in which the vector can stably maintain itself, and 
undergo replication in concert with cell growth. In 
prokaryotes, for example, if a broad host range plasmid 
replication origin is present, the shuttle vector will be 
capable of stable inheritance in a very wide range of 
bacteria, e.g. the origins of replication of RK2 (Pansegrau 
et al., 1994, J Mol Biol 239:623-663) or pBBR (Kovach et al. , 
1994, BioTechniques 16:800-801) are functional in many gram-' 
negative bacteria, such as Pseudomonas, Agrobacteriwn, 
Escherichia, and Rhizobium. Many of the bacteria that harbor 
DNA comprising a broad host range origin of replication are 
30 known to produce metabolites of interest. Origin of 

replication that is functional in a relatively limited range 
of related hosts can also be used, e.g., the replication 
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origin of pAJcijl which functions in four actinomycete genera 
(J Gen Microbiol 131:2431-2441). Alternatively, a shuttle 
vector of the invention can comprise two or more replication 
s origins each having a narrowly defined range that permits the 
vector to be replicated and maintained in the respective 
hosts, e.g. e. coli and Bacillus. Any origin of replication 
derived from IncP, incQ or incW plasmids can be used in a 
vector of the invention, a bacteriophage origin of 
replication, e.g., fi origin of M13 phage, can also be 
1Q present in the vector. The coliphage origin of replication 
can facilitate production of single stranded form of the 
expression constructs useful for various purposes, such as 
but not limited to transformation, hybridization. 

A shuttle vector of the invention may also comprise cis- 
acting sequences derived from naturally-occurring self- 
15 transmissible plasmid, which enable the plasmid to transfer 
themselves from one species or strain of bacteria to another 
by means of an interspecies conjugative process (Hayman et 
al. 1993, Plasmid 30: 251-257). Such sequences, known as 
origins of transfer, are relatively small (e.g., 200-800 bp) 
and can be inserted into a shuttle vector of the invention to 
20 facilitate the transfer of the shuttle vector between 

different species or strains of host organisms. Conjugation 
is a natural process whereby large plasmids are transferred 
between different species or strains of organism via a 
conjugation tube at fairly high frequency. The mobilization 
of transfer origin-containging shuttle vector is mediated by 
25 a specific set of transfer proteins which can be provided by 
expression of function integrated in the host chromosome 
itself or in trans by a Tra helper plasmid (Ditta et al., 
1980, Proc. Natl. Acad. Sci. 77:7347-7351; Knauf et al., 
1982, Plasmid 8:45-54). Strains of E. coli harboring 
integrated Tra functions, e.g., S17-1, are available from the 
30 American Type Culture Collection. 

By using a shuttle vector with the appropriate 
replication origins, transfer origin (s) and selection 
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mechanisms to construct a library, the DNA sequences of the 
donor organisms in a library may readily be mobilized from 
one initial species of host organism to a variety of 
alternative species of host organisms where the donor DNA 
sequences can be stably maintained, replicated and expressed. 
Thus, mobilizable gene expression libraries that are 
constructed with a shuttle vector, and that can be mobilized 
into multiple host organisms by conjugation are within the 
scope of the invention. 

10 F ° r instance ' a Preferred and exemplary expression 

vector-host organism combination is the cosmid, SuperCos 1 
and the Esherlchia coli strain, XLl-Blue MR, both of which 
are commercially available from Stratagene (La Jolla, CA) . 
The vector accepts through a BamHI cloning site DNA inserts 
ranging from 30-42 kbp in size, and carries a neomycin 

15 resistance marker (neoR) and an SV40 promoter that is used 
for expression in mammalian cell. The vector also contains 
an ampicillin resistance gene for selection in prokaryotic 
cells. The E. coli host organism is deficient in certain 
restriction systems (hsdR, mcrA, mcrCB and mrr) , is 
endonuclease-def icient (endAl) , and recombination deficient 
20 (recA) . The host organism cannot cleave inserted DNA 

carrying cytosine and/or adenine methylation, which is often 
present in eukaryotic DNA and cDNA synthesized using methyl- 
dNTP analogs. 

Advantages of this system include the utilization of 
highly efficient lambda in vitro packaging systems for 

25 initially generating a library in restriction minus, recA 
minus, E. coli hosts. Since the quality of source genomic 
DNA may be lower than that is required for naked DNA 
transformations, packaged genomic DNA inserts may be 
protected against degradation. Once inside an E. coli host 

30 cel1 ' damaged inserts may be repaired by the host's cellular 
DNA repair mechanisms. The system requires only small 
amounts of starting genomic DNA (5-10 /xg) , and size selection 
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may not be required since the packaging system only accepts 
inserts in a certain size range. The initial library in E. 
coli may be amplified to produce supercoiled cosmid DNA which 
5 may be used in high efficiency transformation methods for 
introduction into other expression host organisms. 

The Supercos 1 vector may be further modified for 
cloning in a Streptomyces host by replacing the SV40 origin 
of replication and the neoR gene with the Streptomyces origin 
^ of replication (e.g., from the plasmid pUioi or plJ922), and 
the thiostrepton resistance gene. This shuttle vector, 
termed Streptocos (see Figure 6A) , is constructed by 
isolation of the 4.0 kb fragment from pU699 (Hopwood et al. 
1985, Genetic Manipulation of Streptomyces, A Laboratory 
Manual, The John Innes Foundation) containing the pUioi 
15 origin and the thiostrepton resistance gene by digestion with 
Kpnl and Hindlll. This fragment is blunted at the Kpnl site 
and cloned into SuperCos at the Smal-Hindlll restriction 
sites (See Bierman 1992, Denis 1992 for related examples). 
In addition, sequence elements may be introduced for shuttle 
cosmid mobilization via con jugative transfer (Bierman et al 
20 1992, Gene 416:43-49). Different Streptocos versions 

containing Streptomyces-specif ic promoters may be introduced 
into the vector adjacent to the BamRT cloning site. By using 
PCR, Streptomyces promoter fragments may be generated that 
can be directionally cloned into the Notl/EcoRl sites of 
25 Supercos 1. a variety of known Streptomyces promoters may be 
used including ermE, Pptr (1995, Mol Microbiol, 17:989) and 
hrdB (Buttner, M.J. 1989, Mol Microbiol, Vol. 3, pp. 1653- 
1659) . Moreover, one or more replication origin can be 
engineered into Streptocos to facilitate replication and 
maintenance of the vector among various 
30 Streptomyces /Actinomycete species. For example, the 
replication origin derived from pSG5 from Streptomyces 
ghanaensis may be used; pSG5-based vectors have been shown to 
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be compatible, and can therefore coexist with origin of 
replication from other Streptomyces plasmids, such as SCP2*, 
SLP1.2, puioi and pSVHl (Muth et al. , 1989, Mol Gen Genet 
5 219:341-348). This approach can generally be useful for a 
wide range of host-vector systems. Accordingly, SuperCos l 
may be modified by introduction of host replication origins, 
selectable marker genes, and homologous promoters if desired. 

Another exemplary vector that can be used in 
constructing libraries of the invention is pBeloBACll which 
10 is a modified bacterial artificial chromosome (BAC) based on 
the plasmid F factor (Shizuya et al. , 1992, Proc Natl Acad 
Sci 89: 8794-8797). The low copy number plasmid vector is 
capable of handling >300 kb donor DNA and can be maintained 
stably in E. coli. According to the invention, the plasmid 
PBAC is modified by the introduction of a transfer origin 
15 derived from the broad host range plasmid RK2 , the 

replication origin SCP2* from Streptomyces coelicolor (Lydiate 

et al., 1985, Gene 35:223-235), and the apramycin resistance 
gene. 

For combinatorial gene expression libraries using plant 
cells as hosts, the expression of the donor coding sequence 
20 may be driven by any of a number of promoters. For example, 
preferred strains are described in Principles of Gene 
Manipulation 1985, R.w. OLD and S.B. Primrose 3rd ed. 
Blackwell Scientific Pub .; vector s : A survey of molecular 
cloning vectors and their uses 1988, r.l. Rodriguez, D.T. 
Denhardt, Butterworths Pub. ; A Practical guide to molecular 
25 Cloning 1988, B. Perbal, John Wiley and Sons, viral promoters 
such as the 35S RNA and 19S RNA promoters of CaMV (Brisson et 
al. 1984, Nature 310:511-514), or the coat protein promoter 
of TMV (Takamatsu et al. 1987, EMBO J. 6:307-311) may be 
used; alternatively, plant promoters such as the small 
subunit of RuBISCo (Coruzzi et al. 1984, EMBO J. 3:1671-1680; 
30 Broglie et al. 1984, Science 224:838-843); or heat shock 
promoters, e.g., soybean hspl7.5-E or hspl7.3-B (Gurley et 
al. 1986, Mol. Cell.. Biol. 6:559-565) may be used. 
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Both plant cells and protoplasts may be used as host 
cells. Plant hosts may include, but are not limited to, 
those of maize, wheat, rice, soybean, tomato, tobacco, 
5 carrots, peanut, potato, sugar beets, sunflower, yam, 

Arabidopsis, rape seed, and petunia. Plant protoplasts are 
preferred because of the absence of a cell wall, and their 
potential to proliferate as cell cultures, and to regenerate 
into a plant. 

In addition, the recombinant constructs may comprise 
10 Plant-expressible selectable or screenable marker genes which 
include, but are not limited to, genes that confer antibiotic 
resistances, (e.g., resistance to kanamycin or hygromycin) or 
herbicide resistance (e.g., resistance to sulfonylurea, 
phosphinothricin, 'or glyphosate) . Screenable markers 
include, but are not be limited to, genes encoding B- 
15 glucuronidase (Jefferson, 1987, Plant Molec Biol. Rep 5:387- 
405), luciferase (Ow et al. 1986, Science 234:856-859), and B 
protein that regulates anthocyanin pigment production (Goff 
et al. 1990, EMBO J 9:2517-2522). 

To introduce donor organism DNA into plant cells, the 
Agrobacterium tumefaciens system for transforming plants may 
20 be used. The proper design and construction of such T-DNA 
based transformation vectors are well known to those skilled 
in the art. Such transformations preferably use binary 
Agrobacterium T-DNA vectors (Sevan, 1984, Nuc. Acid Res. 
12:8711-8721), and the co-cultivation procedure (Horsch et 
2s al. 1985, Science 227:1229-1231). Generally, the 

Agrobacterium transformation system is used to engineer 
dicotyledonous plants (Sevan et al. 1982, Ann. Rev. Genet 
16:357-384; Rogers et al. 1986, Methods Enzymol. 118:627- 
641) , but it may also be used to transform as well as 
transfer DNA to monocotyledonous plants and plant cells. 
30 (see Hernalsteen et al. 1984, EMBO J 3:3039-3041 ; Hooykass- 
Van Slogteren et al. 1984, Nature 311:763-764; Grimsley et 
al. 1987, Nature 325:1677-1679; Boulton et al. 1989, Plant 
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Mol. Biol. 12:31-40.; Gould et al. 1991, Plant Physiol. 
95:426-434) . 

In other embodiments, various alternative methods for 
5 introducing recombinant nucleic acid constructs into plant 
cells may also be utilized. These other methods are 
particularly useful where the target is a monocotyledonous 
plant cell. Alternative gene transfer and transformation 
methods include, but are not limited to, protoplast 
transformation through calcium-, polyethylene glycol (PEG)- 

iq or electroporation-mediated uptake of naked DNA (see 

Paszkowski et al., 1984, EMBO J 3:2717-2722, Potrykus et al. 
1985, Molec. Gen. Genet. 199:169-177; Fromm et al., 1985, 
Proc. Nat. Acad. Sci. USA 82:5824-5828; Shimamoto, 1989 
Nature 338:274-276) and electroporation of plant tissues 
(D'Halluin et al. , 1992, Plant Cell 4:1495-1505). Additional 

15 methods for plant cell transformation include microinjection, 
silicon carbide mediated DNA uptake (Kaeppler et al. , 1990, 
Plant Cell Reporter 9:415-418), and microprojectile 
bombardment (see Klein et al., 1988, Proc. Nat. Acad. Sci. 

USA 85:4305-4309; Gordon-Kamm et al., 1990, Plant Cell 2:603- 
618) . 

20 For general reviews of plant molecular biology 

techniques see, for example, Weissbach & Weissbach, 1988, 
Methods for Plant Molecular Biology, Academic Press, NY, ' 
Section VIII, pp.. 421-463; and Grierson & Corey, 1988, Plant 
Molecular Biology, 2d Ed., Blackie, London, Ch. 7-9. 

In an insect system, Autographa californica nuclear 

25 polyhydrosis virus (AcNPV) a baculovirus, is used as a vector 
to express donor genes in Spodoptera frugiperda cells. The 
donor DNA sequence may be cloned into non-essential regions 
(for example the polyhedrin gene) of the virus and placed 
under control of an AcNPV promoter (for example the 
polyhedrin promoter) . These recombinant viruses are then 

30 used to infect host cells in which the inserted gene is 
expressed. (e.g., see Smith et al. 1983, J virol 46:584; 
Smith, U.S. Patent No. 4,215,051). 
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In yeast, a number of vectors containing constitutive or 
inducible promoters may be used with Saccharomyces cerevisiae 
(baker's yeast), Schizosaccharomyces pombe (fission yeast), 
5 Pichia pastoris, and Hansenula polymorpha (methylotropic 
yeasts). For a review see, Current Protocols in Molecular 
Biology, Vol. 2, 1988, Ed. Ausubel et al. , Greene Publish 
Assoc. & Wiley interscience, Ch. 13; Grant et al. 1987 
Expression and Secretion Vectors for Yeast, in Methods 'in 
Enzymology, Eds . Wu fi Grossman, 1987, Acad. Press, N.Y., Vol. 
153, pp. 516-544; Glover, 1986, DNA Cloning, Vol. II irl 
Press, Wash., D.C., Ch. 3; and Bitter, 1987, Heterologous 
Gene Expression in Yeast, Methods in Enzymology, Eds. Berger 
* Kxmmel, Acad. Press, N.Y., Vol. 152, pp. 673 -684; and The 
Molecular Biology of the Feast Saccharomyces, 1982, Eds. 
is Strathern et al., Cold Spring Harbor Press, Vols. I and II 
An exemplary vector, pDblet (Brun et al. 1995, Gene, 164: 173- 
177), that can shuttle between E. coli and s. pombe, and 
modification of the vector is described in section 5 5 7 

The invention also provides a series of s. pombe/E.coli 
cosmid vectors, e.g., pPCosfura (Figure 13), pPCosl (Figure 
20 14) that contain a multiple cloning site, a ColEl origin and 
the ampicillin resistance gene respectively for replication 
and selection in E. coli, an autonomously replicating 
sequence (ARS) and the ura4 gene for maintenance and 
selection in s. pombe, an SV40 origin, and dual cos sites for 
in vitro packaging in A phage. The construction of pPCos+ura 
™ and pPCosl is described in section 5.5.7. 

In mammalian host cells, a variety of mammalian 
expression vectors are commercially available, in addition 
a number of viral-based expression systems may be utilized ' 
in cases where an adenovirus is used as an expression vector 
the donor DNA sequence may be ligated to an adenovirus 
transcription/translation control complex, e.g., the late 
promoter and tripartite leader sequence. This chimeric gene 
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may then be inserted in the adenovirus genome by in vitro or 
in vivo recombination. Insertion in a non-essential region 
of the viral genome (e.g., region El or E3) will result in a 
5 recombinant virus that is viable and capable of expressing 
heterologous products in infected hosts. (e.g., see Logan & 
Shenk, 1984, Proc. Natl. Acad. Sci. (USA) 81:3655-3659). The 
Epstein-Barr virus (EBV) origin (OriP) and EBNA-l as a trans- 
acting replication factor has been used to create shuttle 
^ episomal cloning vectors, e.g., EBO-pCD (Spickofsky et al. 
1990, DNA Prot Eng Tech 2:14-18). viral vectors based on 
retroviruses may also be used (Morgenstern et al. 1989, Ann 
Rev Neurosci, 12:47-65). Alternatively, the vaccinia 7.5K 
promoter may be used. (See, e.g., Mackett et al. 1982, Proc. 
Natl. Acad. Sci. (USA) 79:7415-74.19; Mackett et al. 1984, J. 
15 Virol. 49:857-864; Panicali et al. 1982, Proc, Natl. Acad. 
Sci. 79:4927-4931). 

A number of selection systems may be used for mammalian 
cells, including but not limited to the Herpes simplex virus 
thymidine kinase (Wigler, et al. 1977, Cell 11:223), 
hypoxanthine-guanine phosphoribosyltransferase (Szybalska & 
20 Szybalski, 1962, Proc. Natl. Acad. Sci. USA 48:2026), and 
adenine phosphoribosyltransferase (Lowy, et al. 1980, Cell 
22:817) genes can be employed in tk", hgprf or aprt" cells, 
respectively. Also, antimetabolite resistance can be used' as 
the basis of selection for dihydrof olate reductase (dhfr) , 
which confers resistance to methotrexate (Wigler, et al. 
25 1980, Natl. Acad. Sci. USA 77:3567; O'Hare, et al. 1981, 
Proc. Natl. Acad. Sci. USA 78:1527); gpt, which confers' 
resistance to mycophenolic acid (Mulligan & Berg, 1981), 
Proc. Natl. Acad. Sci. USA 78:2072); neomycin 
phosphotransferase (neo) , which confers resistance to the 
aminoglycoside G-418 (Colberre-Garapin, et al. 1981, J. Mol. 
30 Biol. 150:1); and hygromycin phosphotransferase (hyg) , which 
confers resistance to hygromycin (Santerre, et al. 1984, Gene 
30:147). 
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The present invention also provides specific 
modifications of host organisms that improve the performance 
of the combinatorial gene expression libraries. When the 
5 libraries are used for the purpose of generating secondary 
metabolites, the toxicity of the compounds can lead to under- 
representation of these productive host organisms in the 
library. In one embodiment of the invention, the host 
organism may be modified so that the growth and survival of 
the host organism is less adversely affected by the 
^ production of compounds of interest. The increased tolerance 
can reduce the loss of host organisms that are producing 
potent drugs at the screening stage as well as the production 
stage . 

One preferred modification of the host organism is the 
introduction into and/ or over-production of active drug 
i5 efflux systems in the host organism. Membrane-associated 
energy driven efflux plays a major role in drug resistance in 
most organisms, including bacteria, yeasts, and mammalian 
cells (Nikaido 1994, Science 264:382-388; Balzi et al. 1994, 
Biochim Biophys Acta 1187:152-162; Gottesman et al. 1993, Ann 
Rev Biochem 62:385). A modified host organism having an 
enhanced complement of efflux systems can actively secrete a 
broader range of potentially toxic compounds, thus reducing 
their accumulation inside the host organism. Negative 
feedback mechanisms, such as end-product inhibition of the 
metabolic pathway producing the compounds , may be avoided. 
Moreover, the isolation of the compounds may be made more 
2s efficient since the compounds of interest do not accumulate 
inside the host organisms. 

In bacteria, a large number of efflux systems have been 
studied which can pump out a wide variety of structurally 
unrelated molecules ranging from, for example, polyketide 
antibiotics (acrAE genes of E. coli, Ma et al. 1993, j 
3Q Bacterid 175:6299-6313), f luroguinolines and ethidium 

bromide (bmr of Bacillus subtilis and nor A of Staphylococcus 
aureus, Neyfakh et al. 1993, Antimicrob Agents Chemother 
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37:128-129), doxorubicin (drr of Streptomyces peucetius, to 
quaternary amines (qacE of Klebsiella aerogenes and mvrC of 
E. coli). see Table III for a list of non-limiting examples 
5 of efflux systems. Any such efflux systems may be used in a 
prokaryotic host organism. 

In yeast, many genes conferring pleiotropic drug 
resistance encode efflux systems, and may be useful in the 
present invention. For example, the bfrl+ gene confers 
brefeldin A resistance to Schizosaccharomyces pombe, and the 
10 CDRl gene of Candida albicans confers resistance to 

cyclohexamide and chloramphenicol (Prasad et al. 1995, Curr 
Genet 27:320-329) . 

For mammalian cells, the multidrug resistance proteins 
which belong to the class of ATP-binding pump protein may be 

ig used (Juranka et al. 1989, FASEB J, 3:2583-2592; Paulusma et 
al. 1996, Science 271:1126-1128; Zaman et al. 1994, Proc. 
Natl Acad Sci, 91:8822-8826; Breuninger et al. 1995, Cancer 
Res 55:5342-5347, Koepsell EP 0699753). The human mdrl 
multiple drug resistance gene has been functionally expressed 
in Saccharomyces cerevisiae (Kuchler et al. 1992, Proc Natl 

20 Acad Sci 89:2302-2306). Any other efflux systems may also be 
used for eukaryotic cells. 
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Table III: 

List of compounds that are secreted by 
5 active drug efflux systems 



20 



chemical class 


specific name 


efflux svstems 


cationic dyes 


rhomadamine-6G 
ethidium bromide 


bmr 




acrif lavine 


acrAE 


basic antibiotics 


puromycin 
doxorubicin 


bmr 

drr, mdr 


hydrophilic antibiotics 


novobiocin 
macro lide 


acrAE 


hydrophobic antibiotics 


beta- lactams 




organic cation 


tetraphenyl 
phosphonium 


bmr 


uncharged 


taxol 

chlor ampheni co 1 


mdr 
bmr 


weak acid 


nalidixic acid 
mithramycin 


emr 
mdr 


zwitterions 


f luoroquinol ines 


bmr 


detergent 


SDS 


acrAE 
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One or more efflux systems may be introduced, induced or 
overproduced into a host organism. The genes encoding 
components of an efflux system may be introduced into in a 
host organism and expressed using the expression vectors and 
techniques described above. In some instances, it may be 
advantageous to use an inducible promoter for expression of 
the efflux system genes. 

5.1.4. COMBINATORIAL NATURAL PATHWAY EXPRESSION 
LIBRARIES 

10 The present invention relates to the construction and 

uses of combinatorial gene expression libraries, wherein the 
host organisms contain genetic material encoding natural 
biochemical pathways or portions thereof that is derived from 
a plurality of species of donor organisms, and are capable of 
producing functional gene products of the donor organisms. 

15 Biochemical pathways or portions thereof of the donor 

organisms are thus functionally reconstituted in individual 
host organisms of a library. Novel activities and compounds 
of such biochemical pathways may be more accessible to 
screening by traditional drug discovery techniques or by 
methods provided herein. 

20 Either DNA or RNA may be used as starting genetic 

material for preparing such libraries which may include cDNA 
libraries, genomic DNA libraries, as well as mixed 
cDNA/ genomic DNA libraries. DNA fragments derived from a 
plurality of donor organisms, e.g., organisms described in 
Section 5.1.1, are introduced into a pool of host organisms, 

25 such that each host organism in the pool contains a DNA 
fragment derived from one of the donor organisms. 

It may be advantageous if the host organism and the 
donor organisms share certain genetic features, such as 
similar GC content of DNA and common RNA splicing mechanisms, 
or physiological features, such as optimal growth 

30 temperature. It may thus be desirable to use a host organism 
that is phylogenetically closely related to the donor 
organisms. For instance, a prokaryotic host organism may be 
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10 



more desirable for cloning and expression of operons of other 
prokaryotes . 

Donor organisms that are not amenable to traditional 
^ drug discovery or drug development technologies may be 
preferred. For example, most marine bacteria are poorly 
characterized and not amenable to conventional terrestrial 
microbiology protocols. The present invention can simplify 
the development of production and purification processes. 

The fragment of donor DNA that is transferred may 
comprise coding regions encoding functional proteins of a 
complete biochemical pathway or portions thereof, as well as 
natively associated regulatory regions such as promoters and 
terminators. Optimal results may be obtained by using large 
prokaryotic genomic DNA fragments which have a greater 
probability of encoding an entire biochemical pathway, if 
is the native function and organization of the transferred DNA 
fragment is maintained in the host organism, the genes of the 
donor organism may be coordinately expressed. Also provided 
are exogenous regulatory regions that may be attached to the 
DNA fragments so as to ensure transcription of the 
transferred genes in the host organism, thereby replacing or 
2o supplementing transcription initiated from the native 
promoters . 

Interestingly, many of the genes derived from marine 
bacteria have been found to utilize the native promoters to 
express functional proteins in E. coll. Thus, genes of 
marine microorganisms may be expressed even without the need 
25 to use exogenous regulatory regions. An exemplary list of 
marine bacterial genes that uses its native promoter in E. 
coli is provided in Table IV. 
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Table IV: 

5 ii S £.cJli arine bacterial genes that use its native promoter 



Genefs) 

kappa-carrageenase 
(cgkA) . 



Na+/H+ antiporter 
10 (NhaA) 



15 



20 



Genus & Specie 

Alteromonas 
carrageenovora , 
gram(-) aerobe 

Vibrio 

alginolyticus 



25 



phosphodiesterase Vibrio fischeri l 
(cpdP > symbiont 



chitinase 



Alteromonas sp. 
Strain 0-7 



tributyl tin Alteromonas sp. 

chloride resistance M . 1/ gram( . ) rod 



dagA-comp 1 emen t ing 



vibr ioly s in ( nprV) 



tetracycline 
resistance 



melanin synthesis 
(melA) 



DNA modification 
cluster 



Al teromonas 
haloplanktis , 
gram(-) 

Vibrio 

proteolyticus , 
gram(-) 

Vibrio salmonicida, 
aerobe 

Shewanella 
colwelliana, 
gram ( - ) per iphy te 

Hyphomonas 

jannaschiana, 

thermophile 



Reference 

Barbeyron et al., 
1994 , Gene 
139:105-109 

Nakamura et al. , 
1994, Biochim 
Biophys Acta 
1190:465-468 

Dunlap et al. , 
1993, J. Bact. 
175(15) :4615-4624 

Tsujibo et al. , 
1993, J. Bact. 
175(1) : 176-181 

Fukagawa et al . , 
1993, Biochem. 
Biophys. Res. 
Comm. 

194(2) :733-740 

MacLeod et al., 
1992, Mol. Micro. 
6(18): 2673-2681 

David et al. , 
1992, Gene 
112:107-112 

Sorum et al. , 
1992, Chemo. 
36(3) :611-615 

Fugua et al., 
1991, Gene 
109:131-136 

Danaher et al. , 
1990, Gene 
89:129-133 
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In a preferred embodiment, the method of the invention 
takes advantage of the way that genes of prokaryotes, such as 
bacteria, are organized into discrete functionally-related 
s gene clusters in the genome, termed operons. In these 

clusters, genes encoding components of a biochemical pathway 
are linked together to common regulatory sequences. 
Functionally related genes in filamentous fungi 
(Actinomycetes) are also known to be clustered. Gene 
clusters for many bacterial and actinomycete , and few 
10 eukaryotic fungal, biosynthetic pathways have been isolated 
and characterized. For example, twelve proteins used to 
produce the carotenoids zeaxanthin and beta-cryptoxanthin de 
novo in Erwinia herbicola, can be activated and produced 
synchronously in the bacterium E. coli (Perry et al. 1986, J. 
Bacterid, 168:607-612/ Hundle et al. 1991, Photochem and 
5 Photobiol 54:1:89-93). In addition, prokaryotic amino acid 
biosynthetic pathways such as leucine and isoleucine 
biosynthesis, as well as glucose transfer systems are also 
contained in discrete clusters. Thus, when prokaryotes are 
used as donor organisms, it is likely that genes that are 
functionally related in a biosynthetic pathway would be 
20 isolated in one clone. 

Donor organisms having compact genomes that contain 
relatively few non-coding regions are preferred. In many 
aspects, the donor organisms are bacteria which have a 
relatively small genome, for example, 4400 kbp in length for 
^ E. coli, and 2500-3500 kbp for archaebacter ia . The number of 
independent clones required in a library to achieve a 99% 
probability of containing all of the sequences of the donor 
genomes is calculated from the following formula (Clarke et 
al. 1976, Cell 9:91-99): 



30 



N = In (1-P) 



In (1-f) 
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Where 

N = number of recombinant clones necessary in the 
library 

s P = the probability a sequence is represented 

f - the fractional proportion of the genome in a 

single recombinant clone 
For example, E. coli has approximately 4400 kbp of DNA; 
a cosmid vector can package approximately 40 kbp of DNA. 
Following these calculations, the entire genome of e. coli 
10 can be expected to be thoroughly represented in as few as 504 
clones in a cosmid library. Since a typical DNA library can 
contain 500,000 independent recombinant clones, one such 
library can effectively represent the genomes of up to 1,000 
different bacterial species having a genome size similar to 
E. coli. Thus, considerable chemical diversity can be 
15 generated and assessed efficiently by screening a gene 

expression library comprising the diverse genetic material of 
1,000 or more species of bacteria. 

The procedures described in standard treatises, e.g., 
Maniatis et al. 1989, Molecular Cloning, 2nd Edition, Cold 
spring Harbor Press, New York; and Ausubel et al., Current 
Protocols in Molecular Biology, Greene Publishing Associates 
and Wiley Interscience, New York, may be followed to carry 
out routine molecular biology reactions used in constructing 
the combinatorial gene expression libraries. 

A cloning strategy for combinatorial natural pathway 
gene expression library is shown in Figure 3. Any cell from 
a donor organism can potentially serve as the source of 
nucleic acid for construction of a gene expression library. 
Genomic DNA, which includes chromosomal DNA as well as DNA of 
extrachromosomal genetic elements, such as naturally 
occurring plasmids, may be used. Alternatively, RN A of a 
donor organism may be used. RNA, preferably messenger RNA 
(mRNA) , may be extracted, purified and converted to 
complementary DNA (cDNA) by any technique known in the art. 
An oligo-(dT) primer or random sequence primers may be used 
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for priming first strand synthesis of cDNA. DNA inserts may 
optionally be amplified by polymerase chain reaction (PCR) . 

Genomic DNA and RNA may be extracted and purified by the 
procedure provided in Section 5.1.2 or by those that are 
known in the art. For filamentous fungi and bacteria, such 
procedures may comprise any of several techniques including 
a) rapid SDS/high salt lysis of protoplasts prepared from 
young mycelia grown in liquid culture and immediate 
extraction with equilibrated phenol; b) rapid lysis of 
protoplasts in guanidinium isothiocyanate followed by 
ultracentrifugation in a CsCl gradient; or c) isolation of 
high molecular weight DNA from protoplasts prepared in 
agarose plugs and pulsed field gel electrophoresis. For 
bacteria, an alternative procedure of lysis by 
lysozyme/detergent, incubation with a non-specific protease, 
followed by a series of phenol /chlorof orm/isoamyl alcohol 
extractions may be useful. 

For optimal results, large random prokaryotic genomic 
DNA fragments are preferred for the higher probability of 
containing a complete operon or substantial portions thereof. 
The genomic DNA may be cleaved at specific sites using 
various restriction enzymes. Random large DNA fragments 
(greater than 20 kbp) may be generated by subjecting genomic 
DNA to partial digestion with a frequent-cutting restriction 
enzyme. The amount of genomic DNA required varies depending 
on the complexity of the genome being used. Alternatively, 
the DNA may be physically sheared, as for example, by passage 
through a fine-bore needle, or sonication. 

Prior to insertion into a vacant expression vector, such 
DNA inserts may be separated according to size by standard 
techniques, including but not limited to, agarose gel 
electrophoresis, dynamic density gradient centrifugation, and 
column chromatography. A linear 10-40% sucrose gradient is 
preferred. The insertion can be accomplished by ligating the 
DNA fragment into an expression vector which has 
complementary cohesive termini. The amounts of vector DNA 
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and DNA inserts used in a ligation reaction is dependent on 
their relative sizes, and may be determined empirically by 
techniques known in the art. However, if the complementary 
^ restriction sites used to fragment the DNA are not present in 
the expression vector, the ends of the DNA molecules may be 
enzymatically modified, as for example, to create blunt ends. 
Alternatively, any site desired may be produced by ligating 
nucleotide sequences i.e., linkers or adaptors, onto the DNA 
termini; these ligated linkers or adaptors may comprise 
10 specific chemically-synthesized oligonucleotides encoding 
restriction endonuclease recognition sequences. In an 
alternative method, the cleaved expression vector and DNA 
inserts may be modified by homopolymeric tailing. 

After ligation of vector DNA to DNA inserts, the 
expression constructs are introduced into the host organisms. 
15 A variety of methods may be used, which include but are not 
limited to, transformation, transfection, infection, 
conjugation, protoplast fusion, liposome-mediated transfer, 
electroporation, microinjection and microprojectile 
bombardment, in specific embodiments, the introduction of 
bacteriophage or cosmid DNA into an E. coli host is carried 
20 out by in vitro packaging the DNA into bacteriophage 

particles then allowing these particles to infect E. coli 
cells. Other naturally-occurring mechanisms of DNA transfer 
between microorganisms may also be used, e.g., bacterial 
conjugation. 

After the host cells containing expression constructs 
are pooled to form a library, they can be amplified and/or 
replicated by techniques known in the art. The purpose of 
amplification. is to provide a library that can be used many 
times. Amplification may be achieved by plating out the 
library, allowing the bacteria to grow, and harvesting the 
phage or bacteria for storaqe. 

30 

Alternatively, the library may be stored in an ordered 
array. The bulk of the library can be plated out at low 
density to allow formation of single, discrete plaques or 
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colonies, followed by transfer of individual plaques or 
colonies into the wells of coded multi-well master plates, 
e.g. , 96-well plates or 384-well plates. The individual 
g clones are allowed to grow in the wells under the appropriate 
conditions. The coded master plates can be used as an 
archival source to replicate each clone separately into one 
or more working plates. Thus, each clone in the library may 
be handled and assayed individually. The coded archival 
plates may be sealed and stored for future use. Replication 
10 and transfer of the clones may be done with a multi-pin 
replicator, or multi-channel devices for fluid handling. 
Preferably, all or most of the transfers and manipulations 
are performed by laboratory robots (Bentley et al. 1992, 
Genomics 12:534-541). 

The libraries of the invention may be preserved by 
15 lyophilization, or cryopreservation in a freezer (at -20°C to 
-100°C) or under liquid nitrogen (-176°C to -196°C) . 

Host organisms containing donor DNA in a library may be 
identified and selected by a variety of methods depending on 
the host-vector system used. In one approach, such host 
organisms are identified and selected upon the presence or 
2Q absence of marker gene functions, e.g., thymidine kinase 
activity, resistance to antibiotics, such as kanamycin, 
ampicillin, bleomycin, or thiostrepton, production of 
pigment, such as melanin, and resistance to methotrexate. 
Alternatively, a change in phenotype or metabolism of the 
host organism, indicated by metabolic testing, foci formation 
25 in tissue culture, or occlusion body formation in baculovirus 
may be used. Once selected for the presence of donor DMA, a 
series of enzymatic assays or metabolic tests may be carried 
out on the clones for further characterization. 

To characterize the donor DNA inserts in a library of 
clones containing donor DNA or a portion thereof, mini 
30 preparations of DNA and restriction analysis may be performed 
with a representative set of clones. The results will 
provide a fingerprint of donor DNA size and restriction 



WO 00/52180 



PCTAJSOO/05707 



patterns that can be compared to the range and extent of 
insert DNA which is expected of the library. 

5 5 ' 1 ' 5 - g PMBINATORTAT, CHIMERIC PATHW AY EyPBRfi.QTnw LlgRARIES 
The present invention also relates to the construction 
and uses of combinatorial chimeric pathway expression 
libraries, wherein the host organisms contain randomly 
concatenated genetic materials that are derived from one or 
more species of donor organisms, and are capable of producing 
iQ functional gene products of the donor organisms. A 
substantial number of host organisms in the library may 
contain a random and unique combination of genes derived from 
one or more species of donor organism(s) . Coexpression of 
the cloned genes may be effected by their respective native 
regulatory regions or by exogenously supplied regulatory 
is regions. The plurality of gene products derived from the 
different donor organisms interact in the host organism to 
generate novel chimeric metabolic pathways and novel 
compounds. Novel activities and compounds of such chimeric 
pathways may become more accessible to screening by 
traditional drug discovery techniques or by methods provided 
herein. 

20 

While not limited to any theory of how novel pathways or 
compounds are generated in a combinatorial chimeric pathway 
gene expression library, the coexpression of functional 
heterologous genes derived from one or a plurality of species 
of donor organisms enables the gene products to interact in 

25 vivo with each other, and with elements of the host organism. 
Through such interactions, new sets of biochemical reactions 
will arise, some of which can act in concert to form a 
chimeric biochemical pathway. The heterologous gene products 
may encounter substrates, cof actors and signalling molecules 
that are not present in their respective donor organism. 

30 Such substrates, cofactors and signalling molecules may be 
supplied by the host organism, by other heterologous gene 
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products that are coexpressing in the same host organism, or 
from the medium. 

Moreover, some of the heterologous gene products may be 
modified structurally, and compartmentalized or localized 
differently during biosynthesis in the host organism. Some 
of the heterologous gene products may be exposed to a host 
cellular environment that is different from that of their 
respective donors. 

It is envisioned that some heterologous gene products 
may also act on the host organism and modify the host 
cellular environment. Elements of the host cellular 
environment that may affect, or be affected by, the function 
of heterologous gene products may include but are not limited 
to concentrations of salts, trace elements, nutrients, 
oxygen, metabolites, energy sources, redox states, and pH. 
Some heterologous gene products may also interact with host 
gene products which can result in the modification of the 
host's metabolic pathways. 

Depending on the combination of heterologous genes," 
novel chimeric biochemical pathways and novel classes of 
compounds that do not exist in nature may be formed in the 
2o host organisms of the library. In combinatorial chimeric 
pathway expression libraries, the genetic resources of the 
donor organisms are multiplied and expanded to provide a 
diversity of chemical structure that may not be found in 
individual organisms. The libraries so prepared may be 
screened using traditional methods or methods provided by the 
present invention. Thus, the novel pathways and compounds 
are made more accessible to drug screening. 

Any of the donor organisms described in Section 5.1.1 
may be used in preparing a combinatorial chimeric pathway 
expression library. Donor organisms may be selected on the 
basis of their known biological properties, or they may be a 
mixture of known and/or unidentified organisms. 

The combinatorial chimeric pathway expression libraries 
of the invention may be assembled according to the principles 
described in section 5.1.3. m order to allow the random 
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concatenation of DNA fragments from multiple species of donor 
organisms, the procedure for library assembly may be modified 
by including the following steps: generation of smaller 
genomic DNA fragments, ligation with regulatory sequences 
such as promoters and terminators to form gene cassettes, and 
concatenation of the gene cassettes. 

Insert DNAs may be complementary DNA (cDNA) derived from 
mRNA, and/or fragments of genomic DNA, or DNA from an 
archival or mobilizable combinatorial expression library. 
The DNA or RNA of different species of donor organisms may be 
copurified, or they may be isolated separately and then 
combined in specific proportions. The random mixing of 
insert DNAs can be done at any stage prior to insertion into 
the cloning or expression vector. For example, large pieces 
of DNA from an archival library can be isolated and digested 
to give smaller fragments, which are then randomly religated 
to form insert DNAs for a second combinatorial expression 
library, other methods for generating and mixing of random 
fragments of DNA can also be used, for example, in vitro 
recombination can be used when the DNA fragments share some 
sequence homologies. Details of such methods are provided in 
section 5.1.7. 

20 

Methylated nucleotides, e.g., 5-methyl-dCTP, may be used 
in cDNA synthesis to provide protection against enzymatic 
cleavage, and allow directional cloning of the cDNA inserts 
in the sense orientation relative to the promoter and 
terminator fragments. 

25 Random fragments of genomic DNA in the range of 2-7 kbp 

may be generated by partial digestion with a restriction 
enzyme having a relatively high frequency of cutting sites, 
e.g., Sau3AI. Partial digestion is monitored and confirmed 
by subjecting aliquots of the samples to agarose gel 
electrophoresis . 

30 Exogenous regulatory regions, such as constitutive or 

inducible promoters and terminators may be provided to drive 
expression of the cloned genes. When the host and donor 
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expression systems are not compatible, it is essential to 
provide such regulatory sequences. PCR may be used to 
generate various promoter and terminator fragments that are 
s specific to a particular expression host, and have defined 
restriction sites on their termini. Any method for 
attachment of a regulatory region to the DNA inserts may be 
used. Treatment with the Klenow fragment and a partial set 
of nucleotides, i.e., a partial fill-in reaction, may be used 
to create insert DNA fragments which will only ligate 
10 specifically to promoter and terminator fragments with 
compatible ends. 

The present invention provides a method involving the 
use of gene cassettes which contains two copies of a 
promoter, oppositely positioned on either side of a unique 
restriction site. Any DNA inserted into this restriction 
15 site will be transcribed on both strands by the two promoters 
respectively from both sides. 

The present invention also provides an alternative 
method involving the use of gene cassettes which contain a 
promoter and a terminator positioned on either side of a DNA 
insert, if the procedure for directional cloning of cDNA is 
20 followed, the 5' ends and 3- ends of the cDNA inserts would 
have unique matching restriction sites with the 3- ends of 
the promoter fragments and the 5' ends of the terminator 
fragments respectively. 

Genomic DNA fragments or cDNAs bearing compatible 
restriction sites at both ends are ligated to the promoters 
25 and in some cases, terminator fragments, to form gene 
cassettes having a mean size of approximately l-io kbp. 

Concatemers comprising multiple transcription units are 
assembled by an approach similar to that used in peptide 
synthesis. A subset of the pool of gene cassettes is bound 
at one end to a solid phase, e.g., a magnetic bead. The 
30 other free end is subjected to several successive cycles of 
"de-protection" and serial ligation of the remaining pool of 
transcription units. The solid phase allows separation of 
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the concatemers from the unligated DNA fragments after each 
addition cycle, when concatenation is completed, the 
concatemers are released by incubation with a restriction 
s enzyme, such as an intron nuclease, that cleaves a unique and 
very rare site adjacent to the solid phase to reduce the 
probability of cleaving the concatenated DNA. Concatenated 
DNA may then be inserted into a cloning vector to form 
expression constructs which are introduced into the 
appropriate host organisms. Alternatively, the constructs 
^ may be transformed into an E. coli recA minus strain for 
amplification prior to Introduction into the host organisms. 

Details of the synthesis of the promoter and terminator 
fragments, the preparation of gene cassettes, the assembly of 
the DNA inserts, and the ligation of insert and vector, are 
provided in Sections 5.4 and 5.5. 
15 Once the combinatorial chimeric pathway expression 

library is assembled, it can be stored, amplified, 
replicated, pre-screened and screened essentially in the same 
manner as described in section 5.1.3. where the vector 
contains the appropriate replication origins, transfer 
origin (s), and/or selection mechanisms, the genetic material 
20 m the library can be transferred from one species of host 
organism to another species or strain for expression. 

5-1.6. BIASED COMBINATORT&T. P jXPRESSTnw L iBRAPTEfi 
In another embodiment of the invention, a biased 
combinatorial natural or chimeric pathway expression library 

25 may be prepared from preselected fragments of DNA that are 
pooled together from one or more species of donor organisms, 
instead of using only the total pooled genomic DNA or cDNA of 
the donor organism (s), this approach will reduce the number 
of clones that need to be screened and increase the 
percentage of clones that will produce compounds of interest 

30 The preselected fragments of DNA contain genes encoding 
partial or complete biosynthetic pathways, and may be 
preselected by hybridizing to an initial or archival DNA 
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library a plurality of probes prepared from known genes that 
may be related to or are involved in producing compounds of 
interest. 

s The initial DNA library, preferably a cosmid or 

bacterial artificial chromosome (BAC) library, and not 
necessarily an expression library, may contain DNA from one 
or more species of donor organisms. For further pre- 
screening, if the initial library is an expression library, 
DNA in the positive clones may be transferred into and 
expressed in a host for production, such as E. coli or 
Streptomyces lividans. More than one initial library may be 
pre-screened, and DNA from all the positive clones can be 
pooled and used for making the biased combinatorial gene 
expression library. 

The initial or archival library may be amplified so that 
15 DNA of the donor organisms can be pre-screened in a variety 
of host organisms, in one aspect of the invention, the 
cloning vector or expression vector can contain the 
appropriate replication origins and/or transfer origin (s) as 
described in scetion 5.1.3, such that the entire initial or 
archival library can be transferred or mobilized into various 
20 compatible host organisms via conjugation. The transfer can 
also be effected by isolating the donor genetic materials 
from the archival library and introducing the genetic 
material into another species or strain of organism by any 
means, such as but not limited to transformation, 
transfection and electroporation. For example, once a gene 
2s expression library in Streptomyces lividans is generated, it 
can be introduced into specialized host organisms for 
expression and screening, such as S. rimosis that produces 
oxytetracycline, or S. parlus that produces actinomycin D. 
As another example, an archival library can be constructed in 
3o E. coli, and preselected by hybridization with nucleic acid 
probes to identify genetic materials of interest. The 
preselected DNA fragments can be isolated from the archival 
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library, then randomly mixed, and cloned into a mobilizable 
expression vector to form a biased combinatorial expression 
library. Such a library is enriched for a particular class 
5 of metabolic pathway and can be transferred into different 
host organisms for expression in different biochemical and/or 
genetic backgrounds. 

The probes Used for pre-screening may be derived from 
any cloned biosynthetic pathway, such as the polyketide 
biosynthetic loci, as these are the best characterized 
io biosynthetic loci and there is considerable sequence 
conservation between the known clusters, e.g., actl 
(actinorhodin biosynthesis - Malpartida et al. 1987 Nature 
325:818-820), whiE (spore pigment biosynthesis - Blanco et 
al. 1993 Gene 130:107-16) and eryAl (Donadio et al. 1991 
Science 252, 675-679). similar principles may be applied to 
15 other antibiotic or secondary metabolite biosynthetic loci. 
For example the cloned peptide synthetase genes in low-GC 
gram positive bacteria, such as Bacillus (Stachelhaus et al. 
1995 Science 269: 69-72) and in high-GC gram positive 
bacteria, such as actinomycetes species that produce 
zo thiostrepton, virginiamycin, valinomycin and actinomycin, may 
have enough sequence similarities to be used as probes to 
identify new biosynthetic loci in both groups of bacteria 
Other cloned biosynthetic pathway, such as peptide synthases 
and aminoglycoside synthases, can also provide probes for 
pre-screening the initial libraries. 
2s Alternatively, the initial DNA library may be screened 

by probes derived from DNA that encode proteins involved in 
secondary metabolism, such probes may be prepared by 
subtracting non-coding DNA and DNA encoding proteins that 
relate to primary metabolism biosynthetic pathways from total 
DNA. The remaining DNA is thus biased toward coding regions 
3q that encode proteins involved in secondary metabolism. 

Details of the subtraction procedure are provided in Section 
5.3.5. 
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5 * 1 * 7 - RECOMBINED COMBINA TORIAL EXPRESSION LIBRARIES 

In another aspect of the invention, combinatorial gene 
expression libraries can be prepared from genetic materials 
derived from a plurality of organisms, wherein the genetic 
materials have been manipulated by homologous or homeologous 
recombination. 

Homologous recombination is a fundamental cellular 
process by which all organisms employ for generation of 
genetic diversity through random assortment of genes, and for 
^ maintenance of genome integrity through DNA repair. The 
recombination process has been studied in vitro with cell 
extracts or purified components, and defined DNA substrates, 
including DNA substrates with mismatches. Homeologous 
recombination refers to recombination between mismatched or 
imperfectly matched DNA strands. The recombination process 
typically comprises the following mechanistic steps: 
initiation, DNA strand exchange, DNA heteroduplex extension, 
and resolution. Initiation involves the creation of single 
stranded or double stranded DNA breaks suitable for use by 
recombination enzymes. DNA strand exchange involves 
recognition of DNA seguences between two regions of one or 
^ more DNA molecule (s), disruption of existing base pairing, 
and formation of heteroduplex DNA. Heteroduplex DNA 
extension occurs resulting in branch migration and formation 
of a multi-strand Holliday junction which is resolved by DNA 
strand cleavage (s) . The mechanism and proteins involved in 
genetic recombination are reviewed in details in Smith (1988, ' 
Microbiol. Rev. 52:1-28); West, S. C. (1992, Annu. Rev. 
Biochem. 61:603-640); and Camerini -Otero & Hsieh (1995, Annu. 
Rev. Genet. 29:509-552). 

Homologous recombination and homeologous recombination 
can be used in the present invention to improve the 
efficiency of generating novel metabolic pathways. The 
process of homologous or homeologous recombination can be 
carried out in vivo within a live cell, preferably a 
recombinaton-permissive cell, or in vitro in a reaction 
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containing cell extracts and/or isolated recombination 
enzymes. By facilitating exchanges of DNA segments between 
DNA molecules only in regions where there are sequence 
5 similarities, the resulting pool of DNA comprises recombined 
genes that encode products with altered and/or novel 
properties, and rearranged gene clusters comprising 
recombined genes and novel combinations of genes. The novel 
combinations can be the results of additions, exchanges and 
deletions of genes or portions thereof, in this process, 
10 interactions of the various DNA strands in recombination 
occur randomly, but because the process of homologous and 
homeologous recombination require DNA sequence similarities, 
the new genetic sequences incorporated into the gene cluster 
are more likely to be structurally or functionally related to 
the other gene sequences in the cluster. 

If a single DNA molecule contains sequences that are 
similar and repetitive along the length of the molecule, 
intramolecular recombination can occur between these regions 
of the same DNA molecule. The likelihood of intermolecular 
versus intramolecular recombination is affected by the 
reaction conditions, for example, DNA concentrations, which 
can be manipulated depending on the desired result. Low DNA 
concentration favors intramolecular recombination. Such 
intramolecular recombination is also within the scope of the 
invention. 

Homologous and homeologous recombination generates 
sequence diversity by facilitating intermolecular and 
intramolecular DNA strand exchange where the DNA sequences 
are similar. Depending on the locations of the initial 
breakpoints and the final cleavage points which affect the 
extent of the DNA strand exchange, novel chimeric genes and 
chimeric transcriptional units (or operons) can be generated. 
The recombination process tolerates DNA mismatches within the 
heteroduplex DNA and can be exploited to facilitate 
substitutions, insertions, and deletions of DNA sequences. 
For example, when the extent of the DNA strand exchange 
occurs within a gene, a chimeric gene comprising DNA 
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sequences derived from a plurality of different organisms can 
be created. The resulting chimeric gene can comprise 
substitutions, insertions, and deletions, along the length of 
the recombined portion of the gene. The chimeric gene may 
have a different promoter, enhancer, terminator, and gene 
structure; for example, the number and arrangement of exons 
and introns can be altered. The location of regulatory 
sequences relative to the coding regions of the chimeric gene 
can also be changed. Consequently, the chimeric gene can 
^ acquire a different set of transcriptional, translation, and 
splicing signals, which causes the chimeric gene to be 
regulated differently. The product of the chimeric gene will 
likely be structurally homologous to the original gene 
product and thus, display similar but modified functional 
properties, e.g., specificities and kinetics of interactions 
with other molecules may be modified. When the extent of the 
DNA strand exchange spans multiple genes, or multiple 
transcriptional units or operons, novel combinations of 
genes, operons and polycistronic messenger RNA are created. 
Aside from modifying the structure of the genes at the 
junction of the exchange in a manner as described above, the 
structure and function of the entire transcriptional unit or 
operon can be altered. The normal pattern of expression of 
a cluster of functionally-related genes can be altered as the 
mechanisms and signals that regulate the transcription and 
translation of these genes are disrupted. The effect is 
particularly pronounced in many prokaryotic autogenously- 
controlled operons wherein a gene product of an operon 
regulates the overall expression of the operon which encodes 
the gene product. 

Accordingly, homologous and homeologous recombination is 
useful in generating novel metabolic pathways from metabolic 
pathways of microorganisms. The proteins in these pathways 
are frequently encoded by genes organized in clusters or 
operons where there are multiple regions of substantial DNA 
sequence similarities along each of the genes in the operon. 
The methods of the invention is particularly applicable to 



25 



30 



- 71 - 



WO 00/52180 PCT/US00/05707 

biosynthetic pathways that are characteristically encoded by 
a few large genes which are transcribed and translated as 
high molecular weight multifunctional and/or multimeric 
s biosynthetic enzymes, such as the type I polyketide 
synthases. Exchange of DNA facilitated by homologous or 
homeologous recombination may preserve the translation 
reading frames of these large genes and reduce the chance of 
complete disruption of the interactions of these 
multifunctional and/or multimeric biosynthetic enzymes. This 
io approach better ensures that the recombined genes encode 

products of similar or related functions. The methods of the 
invention are also well suited to exchange of DNA of related 
genes among bacteria, such as actinomycetes and myxobacteria 
whxch display a characteristically high GC content that 
imparts gene and protein sequence conservation due to bias in 
codon usage. 

While not limited to any particular theory of how 
homologous or homeologous recombination occurs, the non- 
limiting examples and discussions in this section serve to 
illustrate that novel genes, transcriptional units, operons 
and metabolic pathways can be generated by the methods of the 
2Q invention using homologous or homeologous recombination, one 
advantage of this approach is that, because the random 
shuffling of genetic materials occurs selectively between DNA 
regions that are homologous or at least partially homologous 
the number of combinations that result in an inactive gene or 
unproductive metabolic pathway will be reduced. This in turn 
2s will reduce the number of clones that need to be screened in 
a recombined combinatorial gene expression library for the 
biological activities of interest. Depending on the design 
of the experiment, the products of the in vitro recombination 
reaction can be recovered and used as starting material for 
the construction of combinatorial gene expression libraries 
3Q of the invention, in particular, the types of gene expression 
libraries as described in section 5.1.4 and 5.1.6. 

In one embodiment, the combinatorial gene expression 
libraries of the invention are prepared using DNA isolated 
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from a plurality of species of donor organisms, wherein the 
DNA have been subjected to homologous or homeologous 
recombination in vitro. The in vitro recombination reaction 
s can be carried out using cell extracts, prepared by methods 
known in the art, such as the methods of Potter & Dressier 
(1978, Proc. Natl. Acad. Sci. 75:3698-3702) and Benbow & 
Krauss (1977, Cell 12:191-204), the disclosure of which are 
incorporated herein by reference. Alternatively, the 
recombination reaction can be carried out using purified or 
^ recombinantly-produced enzymes reconstituted in vitro. 
Accordingly, the in vitro recombination reaction of the 
invention comprises incubating DNA molecules of a plurality 
of species of organisms with cell extracts and/or purified 
recombination enzymes for a sufficient period of time and 
under the appropriate conditions to allow homologous or 
is homeologous recombination to occur among the DNA molecules. 
In preferred embodiments, the cell extracts and 
recombination enzymes are obtained from bacteria, such as but 
not limited to E. coli. 

An essential recombination enzyme that facilitate 
homologous pairing and DNA strand exchange in an ATP- 
2Q dependent manner is the recA protein of E. coli. The 

structure of recA and its activity in vitro are well known 
and have been extensively described in reviews, for example, 
Roca & cox (1997, Prog. Nuc. Acid Res. Molec. Biol. 56:129-' 
223; 1990, Crit. Rev. Biochem. Molec. Biol. 25:415-456), the 
disclosures of which is incorporated herein by reference. A 
25 preferred embodiment of the invention utilizes E. coli recA 
protein in the in vitro recombination reaction. Although E. 
coli recA is used herein to illustrate the invention, 
homologs of recA, and other proteins, enzymes or protein 
complexes of similar function, in prokaryotes and eukaryotes 
3o can also be used. Unlike other in vitro mutagenesis methods 
such as DNA shuffling by random fragmentation and PCR-based 
reassembly (Stemmer, 1994, Nature 370:389-91; Stemmer, Proc 
Natl. Acad, sci., 1994, 91:10747-51), the methods of the 
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invention uses a recombination enzyme, such as recA and its 
homologs, to facilitate DNA strand exchange. 

E. coli recA protein promotes stable interaction between 
a single-stranded DNA and a homologous duplex DNA, resulting 
in the formation of a complex of recA with three DNA strands. 
RecA-mediated DNA strand exchange reaction can also involve 
four strands of DNA. The process is typically initiated with 
the formation of a recA-nucleoprotein filament in which recA 
coats the single stranded or gapped region of a first DNA 
substrate. The recA-nucleoprotein filament becomes a 
sequence-specific DNA binding entity that searches for 
sequence homology and becomes aligned with a second linear 
duplex DNA substrate. This is followed by a strand switch to 
create a crossover junction (a Holliday junction if four 
strands are involved) . Extensive regions of heteroduplex DNA 

15 containing insertions, deletions and substitutions, are then 
produced by a unidirectional, facilitated branch migration 
process that is accompanied by ATP hydrolysis. Resolution of 
the junction is effected by DNA strand cleavage (s). The 
other E. coli proteins that may be used in an in vitro 
recombination reaction include recBCD, SSB, DNA polymerase I, 

20 DNA ligase, DNA gyrase, and gene products of ruvAB, and ruvC 
which are described in more detail in Kowalczykowski (1994, 
Experientia, 50:204-215), the disclosure of which is 
incorporated herein by reference. These proteins are 
generally present in a cell extract of E. coli, and if 
desired, can be supplemented individually as a purified 

25 protein or recombinantly produced protein. Specialized 
recombination-proficient cells can also be used to prepare 
the cell extracts for the recombination reaction. 
Preferably, mutant strains of E. coli that are defective in 
one or more steps in DNA mismatch repair can be used. For 
example, the cell extracts prepared from such mutants may 

30 lack one or more of the gene products encoded by the mutL, 
muts, mutH and, mutu genes. 
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The use of other DNA strand exchange proteins that 
function independently of ATP are also within the scope of 
the present invention. See Eggleston & Kowalczykowski (1991, 
s Biochimie 73:163-176), the disclosure of which is 
incorporated herein by reference. 

The recombination process can be applied to genetic 
materials directly obtained from a plurality of donor 
organisms. Techniques described in sections 5.1.2 and 5.3 
and known in the art can be applied to isolate and/or purify 
the DNA. Thus, for example, if the donor organisms are known 
and their DNA molecules are separately purified, equal molar 
amounts of DNA from each species of donor organism can be 
mixed in the recombination reaction. To improve efficiency 
of the method, the starting population of DNA can be enriched 
for a preselected property using hybridization techniques, 
i5 such as those described in section 5.1.6 for making a biased 
combinatorial gene expression library. The preselected DNA 
may display sequence similarities to nucleotide sequences 
that encode proteins that form a complete or partial 
metabolic pathway of interest. Thus, for example, DNA 
encoding enzymes that catalyze various steps of a metabolic 
2o pathways of interest from different species of organisms can 
be mixed; and homologous or homeologous recombination among 
these DNA molecules in vitro is facilitated by the recA 
protein. DNA strand exchange will occur between the DNA 
molecules in regions where there are DNA sequence 
similarities, resulting in the formation of heteroduplex DNA 
and multi-strand DNA junctions. The product of the 
recombination process can be recovered by introducing into an 
intermediate bacterial host, for example, by DNA 
transformation or electroporation, in which heteroduplex DNA 
and multi-strand DNA junctions can be resolved. 
Alternatively, the recombined DNA can be recovered by 
repairing the DNA in vitro (if desired) , cutting the DNA with 
the appropriate restriction endonuclease(s) to an appropriate 
size, and cloning into expression vectors using techniques 
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known in the art and/or techniques described in sections 
5.3.7 r 5.4 and 5.5. 

The DNA used in the recombination reaction can be single 
stranded DNA, double stranded DNA, linear DNA, and circular 
DNA. Double stranded DNA can be treated before the 
recombination reaction by an agent such as but not limited to 
ultraviolet light, chemicals and/or enzymes to generate 
double-stranded DNA with partially single-stranded regions, 
gaps, or nicks along the molecule so that it can serve 
effectively as a substrate for binding by the recombination 
enzymes. If desired, highly repetitive DNA sequences may 
also be removed by techniques known in the art prior to 
recombination, see for example, Section 5.5.1. 

In a preferred embodiment of the invention, the 
recombination process is directed towards a set of DNA 
molecules of interest, which, for example, are known to 
encode enzymes of one or more metabolic pathways of interest. 
This set of DNA molecules, herein referred to as target DNA 
molecules, can be isolated and/or enriched by using 
hybridization, cloning, and amplification techniques, such as 
those known in the art and described for making a biased 
combinatorial gene expression library. The target DNA 
molecules can be derived from one or more species of 
organisms. Preferably, the target DNA molecules are each 
attached to a cloning vector sequence that is functional in 
an appropriate ceil and thus useful for recovering and 
propagating the recombined DNA fragments. Optionally, each 
of the target DNA molecules may contain a selectable marker 
which facilitates selection and recovery. 

The target DNA molecules are incubated with a pool of 
DNA derived from one or more species of organisms, herein 
referred to as substrate DNA. The substrate DNA provides the 
desired genetic diversity, and comprises DNA sequences from 
diverse sources but which share sequence similarities with 
those of the target DNA molecules. Preferably, the substrate 
DNA molecules are added in molar excess to the target DNA to 
drive the recombination reaction. To enhance the efficiency 
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of the recombination reaction, the substrate DNA are 
pretreated with an agent and/or denatured to generate single 
stranded regions and gaps. For example, DNA obtained from an 
^ environmental sample can first be cloned in a cosmid vector 
in which the DNA is flanked by rare-cutting restriction 
enzyme sites. The substrate DNA may be excised, purified, 
concentrated, and then treated with a base to form single 
stranded regions before use in recombination reactions. 
Under the appropriate conditions and in the presence of 
proteins, such as the E. coli recA protein, the substrate DNA 
will recombine with the target DNA. The products of the 
recombination process can be introduced into a bacterial 
host, for example, by DNA transformation or electroporation, 
in which heteroduplex DNA and multi-strand DNA junctions are 
resolved. Recombined DNA fragments can be recovered from the 
bacterial host for further manipulations, or used directly 
for expression in the host. 

The process of recA-mediated recombination starts with a 
search for homology by a single stranded DNA-recA complex 
followed by pairing of homologous or partly homologous DNAs. 
The process of homeologous recombination involves insertions, 
deletions and mismatches (Bianchi & Radding, 1983, Cell 
35:511-520), and the result of the process can be affected by 
the reaction conditions. Accordingly, the in vitro 
recombination reactions of the invention can be manipulated 
to create a range of reaction conditions that favor 
recombination among DNA molecules with different degrees of 
sequence similarities; or that favor different extents of DNA 
strand exchange. Reaction conditions, such as ionic 
compositions and concentrations, protein cof actor 
concentrations, nucleotide cof actor concentrations, can be 
varied as described in Malkov et al. (1997, J. Mol. Biol. 
271:168-177); Bertrand et al. (1995, Biochimie, 77:840-7); 
Rougee et al. (1992, Biochem. 31:9269-9278); Roberts & 
Crothers (1991, Proc. Natl. Acad. Sci. 88:9397-9401); 
Honigberg et al., (1986, Proc. Natl. Acad. Sci. 83:9586- 
9590); Soltis & Lehman (1984, J. Biol. Chem. 259:12020-4); 
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the disclosures of which are incorporated herein by 
reference. For example, increasing the ionic concentration 
of the reaction from o mM potassium chloride to 100 mM 
s increased the stringency of the reaction, i.e., high salt 
concentration has an inhibitory effect on the formation of 
the triple-stranded complex with mismatches, it is one 
aspect of the invention to modulate the specificity and 
stringency of the in vitro homeologous recombination reaction 
so as to generate further diversity with mismatched cDNA and 
genomic DNA fragments. 

Accordingly, in one preferred embodiment, the invention 
provides a method for making a recombined combinatorial gene 
expression library comprising the following steps: 

(a) treating cDNA or genomic DNA fragments obtained 
from a plurality of species of donor organisms 

15 With an a ^ ent t° create single stranded regions; 

(b) incubating the cDNA or genomic DNA fragments 
with DNA fragments encoding proteins that form a 
part of a metabolic pathway of interest, in a 
reaction comprising a recA protein for a time 
sufficient to allow recombination to occur; 

2o (c) recovering the recombined cDNA or genomic DNA 

fragments; and 
(d) ligating a DNA vector to the recombined cDNA or 

genomic DNA fragments to form expression 
constructs, 

wherein the genes contained in the cDNA or genomic DNA 
2s fragments are operably-associated with their native or 
exogenous regulatory regions which drive expression of the 
genes in an appropriate host cell. Prior to step (a) the 
CDNA or genomic DNA can optionally be preselected or enriched 
for sequences displaying sequence similarities to nucleotide 
sequences encoding proteins that form a part of the metabolic 
pathway of interest. 

in another more preferred embodiment, the invention 
provides a method for making a recombined combinatorial gene 
expression library using preselected substrate DNA and 
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presented target DNA molecules which are present in a 
cloning vector. The method comprises the following steps: 

(a) preselecting or enriching for cDNA or genomic 

5 DNA fragments (i.e., subsrate DNA) obtained from 

a plurality of species of donor organisms that 
are displaying seguence similarities to 
nucleotide seguences encoding proteins that form 
a part of a metabolic pathway of interest; 

(b) treating the substrate cDNA or genomic DNA 
10 fragments with an agent to create single 

stranded regions; 

(c) incubating the substrate cDNA or genomic DNA 
fragments with target DNA fragments that encode 
proteins that form all or a part of a related 
metabolic pathway of interest, in a reaction 

15 comprising the E. coli recA protein for a time 

sufficient to allow recombination to occur; and 

(d) introducing the recombined cDNA or genomic DNA 
fragments into a host; 

wherein the target DNA fragments comprise cloning vector 
sequences that allow propagation of the recombined DNA 
ao fragments in the host, if exogenous regulatory regions are 
present m the cloning vector seguence, the genes contained 
m the recombined cDNA or genomic DNA fragments can become 
operably-associated with such exogenous regulatory regions 
which will drive expression of the genes in an appropriate 
host cell. The genes in the recombined DNA fragments can also 
2s be expressed using its native regulatory regions in the 
appropriate host cells. 

In another embodiment of the invention, the 
recombination process is carried out in vivo. The in vivo 
reaction involves first, the introduction of a target DNA as 
described above, into the chromosome of the host cell along 
30 with a portion of a positive selection gene locus. The host 
organism could either be the intended expression host, such 
as those described in section 5.1.3, e.g., s. lividans, or an 
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intermediate cell specifically chosen for its permissive 
recombination properties (e.g., Bacillus species). Next, the 
substrate DNA which have been cloned in a delivery vector is 
5 introduced into the host cells. The delivery vector 

comprises cloning sites for insertion of the substrate DNA, a 
portion of a positive selection gene locus that is 
complementary to the portion of the same gene locus 
integrated into the host chromosome; and a negative selection 
marker (e.g., glucose kinase, glkA) that is located distal to 
1Q the positive selection marker and the substrate DNA cloning 
sites. When substrate DNA recombines with target DNA in 
vivo, the respective portions of the positive selection gene 
locus on the delivery vector and the host chromosome are 
joined and the locus becomes functional, a second round of 
recombination occurs in vivo and causes the excision of the 
15 negative selection marker. Selection for the positive 
selection gene locus and against the negative selection 
marker allows the identification of host cells in which the 
desxrable directed recombination took place. The host cells 
containing recombined DNA can either be used directly for 
expression, or used for preparing copies of the recombined 
2Q DNA for introduction into an expression host. 

Combinatorial gene expression libraries containing 
recombined DNA can be screened by methods described in 
section 5.2 and methods known in the art. 

5 ' 2 ' SCREENING COMRINATOR TAT, KYPFE SSTOtt T.TRP^t.c 
25 The drug discovery system of the present invention 

further encompasses novel methods for screening combinatorial 
expression libraries. While standard methods of screening 
expression libraries, such as antibody binding and ligand 
binding, can also be used with expression libraries of the 
present invention, the libraries can be adapted to a reporter 
30 regimen tailored to identify host organisms that are 
expressing the desirable pathways and metabolic products 
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The methods claimed herein enables the management of 
large sample numbers with minimal handling to permit 
efficient and high-throughput detection and isolation of 
s productive clones in the library. The libraries may be pre- 
screened for a broad range of activities, for the production 
of a class of compounds or for the presence of relevant DNA 
seguences. The libraries may also be used directly with a 
target in both in vivo and in vitro assays. The identified 
or isolated population of cells may readily be cultured, 

1Q expanded in numbers, and subjected to further analysis for 
the production of novel compounds. The genes encoding the 
metabolic pathway that lead to production of the novel 
activity or compound may be delineated by characterizing the 
genetic material that was introduced into the isolated 
clones, information on the genes and the pathway, and the 

15 clones, will greatly facilitate drug optimization and 
production. 

As used herein, the terms "library clones" or "library 
cells" refer to host cells or organisms in a combinatorial 
gene expression library that contain at least one fragment of 
donor DNA that may encode a donor metabolic pathway or a 
20 component thereof. The term "positive clones" or "positive 
cells" refers to library clones or cells that produce a 
signal by virtue of the reporter regimen. The term 
"productive cells" or "productive clones" refers to host 
cells or organisms in the library that produce an activity or 
compound of interest, in distinction from the remainder "non- 
25 productive cells" in the library. 

The term "pre-screen" refers to a general biological or 
biochemical assay which indicates the presence of an 
activity, a compound or a gene of interest. The term 
"screen" refers to a specific therapy-oriented biological or 
biochemical assay which is directed to a specific disease or 
30 clinical condition, and employs a target. The term "target" 
refers generally to whole cells as well as macromolecules , 
such as enzymes, to which compounds under test are exposed in 
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a screen. The use of both pre-screens and screens generally 
embodies visual detection or automated image analysis of a 
color igenic indicator, fluorescence detection by 
fluorescence-activated cells sorting (FACS) or the use of a 
magnetic cell sorting system (MACS) performed on a population 
of library cells in the presence of a reporter regimen. 

The methods of the invention provide alternative but not 
mutually exclusive approaches to generation of detectable 
signal associated with productive cells for the purpose of 
detecting and isolating these cells of interest. A reporter 
can be a molecule that enables directly or indirectly the 
generation of a detectable signal. For example, a reporter 
may be a light emitting molecule, or a cell surface molecule 
that may be recognized specifically by other components of 
the regimen. A reporter regimen comprises a reporter and 
compositions that enable and support signal generation by the 
reporter. The reporter regimen may include live indicator 
cells, or portions thereof. Components of a reporter regimen 
may be incorporated into the host organisms of the library, 
or they may be co-encapsulated with individual or pools of 
library cells in a permeable semi-solid medium to form a 
discrete unit for screening. 

20 

To facilitate detection of compounds of interest as 
described in the following text, absorptive materials such as 
neutral resins, e.g., Diaion HP20 or Amberlite XAD-8 resin, 
may be added to cultures of library cells (Lam et al. 1995, j 
Industrial Microbiol 15:453-456). Since many secondary 

25 metabolites are hydrophobic molecules, the release or 

secretion of such metabolites may lead to precipitation on 
the cell exterior. Inclusion of such resins in the culture 
causes the sequestration to occur on the resin which may be 
removed from the culture for elution and screening. 

In one embodiment of the invention, the host organisms 

30 are en 9ineered to contain a chemoresponsive construct, 
comprising a gene encoding a reporter molecule operably- 
associated with a chemoresponsive promoter that responds to 
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the desired class of compounds or metabolites to be screened 
in the expression library. In the presence of the desirable 
activity or compound, the chemoresponsive promoter in a 
positive clone is induced to initiate transcription of the 
operably-associated reporter gene. The positive cell is 
identified by detectable signals generated by the expression 
of the reporter gene. 

In an alternative embodiment, a physiological probe can 
be used which generates a signal in response to a 
physiological change in individual cells as a result of the 
presence of a desirable activity or compound. Such a probe 
may be a precursor of a reporter molecule that is converted 
directly or indirectly to the reporter molecule by an 
activity or compound in the biochemical pathway sought. Upon 
contact with a productive cell, the physiological probe or 
reporter precursor generates a detectable signal which 
enables identification and/or isolation of the productive 
cell, contact may be effected by direct addition of the 
probe or precursor to the library cells. Alternatively, 
contact may be effected by encapsulation and diffusion of the 
probe or precursor to the library cells during screening. 
^ In yet another embodiment of the invention, indicator 

cells may be used to signal the production of a desirable 
activity or compound, thereby enabling identification and/or 
isolation of productive cells in the library. Whole live or 
fixed indicator cells, or cellular fractions thereof may be 
mixed or co-encapsulated with individual or pools of library 
_ cells. Indicator cells are selected for their biological 
properties which is responsive to the presence of the 
desirable activity or compound. Indicator cells may be the 
target cells of the desirable compound. Alternatively, 
indicator cells may be used in conjunction with a reporter to 
generate a detectable signal. 

Pre-screens and screens for each library are chosen 
after comprehensive characterization of the host organism 
and, whenever possible, of the donor organisms. Assays in 
which the host organisms are positive are disqualified, while 
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•assays in which the donor organisms are positive are 
considered acceptable library pre-screens or screens. 
Substrates are preferably the targets of enzymes relevant to 
5 desirable biosynthetic capabilities, may be used to 
alternatively irrelevant targets (e.g., amylase, 0- 
galactosidase) that indicate the presence of transcriptional 
and translational activity for the DNA in a specific clone. 

In yet another embodiment of the invention, antibiotic 
resistance may be used as an indicator of production or 
1Q potential production of interesting secondary metabolites. 
When library clones are exposed to a panel of antibiotics, 
resistance to the antibiotics may indicate the presence of a 
self-defense mechanism, such as efflux pumps which are 
freguently found adjacent to secondary metabolite 
biosynthetic pathways as protection against auto-toxicity. 
1S Such clones may not exhibit secondary metabolite production 
at the time of detection, but have increased probability of 
containing adjacent biosynthetic pathways that can be further 
manipulated or examined as desired. 

The present invention also provides encapsulation as an 
efficient high-throughput method for growing cells in a 
2Q confined space, replacing the classic method of growing 

bacteria in petri dishes. Growing cells in a plate format is 
both labor- and materials- intensive, while encapsulated 
cells can be grown easily in a liguid culture with the 
advantage that dividing cells are kept together, and thus 
facilitating detection of interesting secondary metabolites. 
25 Another advantage of encapsulation is the ability to co- 
encapsulate components of the reporter regimen and/or other 
indicator cells with library cells so that pre-screening or 
screening may be performed in a discrete unit. Encapsulation 
of cells can be performed easily by means of thermal or ionic 
gelation using materials such as agarose, alginate or 
3Q carrageenan. 

FACS is a well-known method for separating particles 
(l-130jim in size) based on the fluorescent properties of the 
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particles (Kamarch, 1987, Methods Enzymol, 151:150-165) 
PACS works on the basis of laser excitation of fluorescent 
moieties in the individual particles. Positive fluorescence 
s results in addition of a small electrical charge to the 
particle. The change allows electromagnetic separation of 
positive and negative particles from a mixture. Separated 
particles may be directly deposited into individual wells of 
96-well or 384-well plates. 

MACS is a well-known method for separating particles 
io based on their ability to bind magnetic microspheres (0.5- 
100 M m diameter) (Dynal, 1995). A variety of useful 
modifications can be performed on the magnetic microspheres 
including covalent addition of antibody which specifically 
recognizes a cell-surface antigen or hapten. Alternatively 
for magnetization of encapsulated cells, a reporter regimen' 
is can be incorporated into host cells that generate 

magnetogenic reporter proteins, such as ferritin, in this 
case, encapsulated cells that generate a positive signal act 
as magnetic microspheres. The selected microspheres can be 
physically manipulated by exposure to a magnetic field. For 
example, the selected microspheres may be sequestered by 
zo application of a magnet to the outside of the reaction 
vessel . 



5.2.1. REPORTER CONSTRUCTS 

According to the present invention, the host organisms 
xn the library may be engineered to contain a chemoresponsive 
as reporter construct comprising a chemoresponsive promoter 
operably-associated with a reporter gene. The host organism 
and/or the construct may contain other genes encoding 
accessory proteins that are involved in the regulation of 
transcription from the chemoresponsive promoter or the 
production of signals. 

A chemoresponsive promoter is any double-stranded DNA 
sequence that is capable of binding an RNA polymerase and 
initiating or modulating transcription of an operably- 
associated reporter gene only in the presence of a certain 
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kind of activity or a certain class of compounds. 
Preferably, the chemoresponsive promoter has no or only a 
negligible level of constitutive background transcriptional 
^ activity in the host organism in the absence of the inducing 
activity or compound. A chemoresponsive promoter that 
respond negatively to the presence of an activity or compound 
by decreasing or ceasing transcriptional activity may also be 
used. 

Promoters useful in the present invention may include, 
but are not limited to, promoters for metabolic pathways, 
biodegradative pathways, cytochromes and stress response 
(Orser et al. 1995, In vitro Toxicol 8:71-85), such as heat 
shock proteins. For example, the Pm promoter of the 
Pseudomonas TOL plasmid meta-cleavage pathway and its 
positive regulator Xyls protein which is inducible and 
15 modulated by a range of benzoates and halo- or alkylaromatic 
compounds may be used (Ramos et al. 1988, FEBS Letters 
226:241-246; de Lorenzo et al. 1993, Gene 130:41-46; Ramos et 
al. 1986, Proc Natl Acad Sci 83:8467-8471; Mermod et al. 
1986, J. Bateriol 167:447-454). other non-limiting examples 
of chemoresponsive promoters are promoters relating to 
20 phosphonate utilization (Metcalf et al. 1993, J Bacterid 
175:3430-3442), promoters sensitive to cis-cis-muconate 
(Rothmel, 1990); promoters sensitive to antibiotics and 
salicylates (Cohen et al. 1993, J Bacterid, J.75i78 56-78 62; 
Cohen et al. 1993, J. Bacterid, 175:1484-1492), promoters' 
from the arsenic and cadmium operons from Staphylococcus 
25 aureus (Corbisier et al, 1993, FEMS Letters 110:231-238); 
sfiA (Quillardet et al. 1982, Proc Natl Acad Sci 79:5971- 
5975), zwf (Orser et al., 1995, supra). 

A reporter gene encodes a reporter molecule which is 
capable of directly or indirectly generating a detectable 
signal. This includes color igenic or magnetogenic reporters 
30 as well as any light-emitting reporter such as 

bioluminescent, chemi luminescent or fluorescent proteins may 
be used, which includes but are not limited to the green 
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fluorescent protein (GFP) of Victoria aequoria (Chalfie et 
al. 1994, Science 263:802-805), a modified GPP with enhanced 
fluorescence (Heim et al. 1995, Nature 373:663-4), the 
5 luciferase (luxAB gene product) of Vibrio harveyi (Karp, 

1989, Biochim Biophys Acta 1007:84-90; Stewart et al. 1992, J 
Gen Microbiol, 138:1289-1300), and the luciferase from 
firefly, Photinus pyralis ( De Wet et al. 1987, Mol Cell Biol 
7:725-737). Any fluorigenic or colorigenic enzymes may be 
used which includes but are not limited to beta-galactosidase 
10 (LacZ, Nolan et al. 1988, Proc Natl Acad Sci USA 85:2603- 
2607), and alkaline phosphatase. Any cell surface antigen 
may be used, for example, E. coli thioredoxin-f lagellin 
fusion protein, i.e., e. coli thioredoxin (the trxA gene) 
expressed as a fusion protein with flagellin (the flic gene) 
l5 on the surface of E. coli flagellae (Lu et al. 1995, 
Bio/Technology 13:366-372). 

An exemplary chemoresponsive reporter construct provided 
herein is pERD-20-GFP which contains the Pm promoter and the 
XylS gene of Pseudomonas (Ramos et al. 1988, FEBS Letter 
226:241-2476) that are responsive to certain classes of 
20 benzoates, resulting in transcription and translation 
(expression) of the reporter, GFP (see Figure 6) . 

Different promoter sequences may be generated by PCR and 
attached to the coding regions of GFP or flagellin- 
thioredoxin reporter. Genomic and plasmid DNA containing the 
promoter of interest may be purified from the relevant 
2S species using standard DNA purification methods, and 

resuspended in TE. Primers may be synthesized corresponding 
to the 5' and 3- boundaries of the promoter regions with 
additional sequences of restriction sites to facilitate 
subcloning. The amplification reactions are carried out in a 
thermocycler under conditions determined to be acceptable for 
30 the selected template and primers. The reaction products are 
separated by agarose gel electrophoresis, and subcloned using 
the TA Cloning Kit (Invitrogen, La Jolla) . The amplified 



- 87 - 



WO 00/52180 



PCT/USOO/05707 



promoter sequences may be recloned into a general purpose 
cloning vector in a context 5* to the GFP or flagellin- 
thioredoxin cDNA. 

5 

5 ' 2 - 2 ' PHYWOMGTCAT, PROBES AND REPORTRP PPFrrm cn.c 
A physiological probe as used herein is a fluorescent or 
color igenic agent which upon contact or entry, generates a 
signal in response to changes in physiological and/or 
metabolic parameters of a library cell or indicator cell. 
^ The probe can be an enzyme substrate linked to a 

fluorogenic agent. For example, a fluorogenic alkyl ether 
can be incubated with the cells. If the cell is producing 
polyaromatic hydrocarbons, the hydrocarbons can induce 
microsomal dealkylases, which in turn cleave the fluorogenic 
alkyl ether, yielding a fluorescent product. 
is Fluorescent probes may be selected for detection of 

changes in the following physiological and metabolic 
parameters such as, but not limited to, those described in 
Shechter, et al. (1982, FEBS Letters 139:121-124), and 
Bronstein et al. (Anal Biochem 219:169-81). 



20 



Metabol ic activify 

Decrease in 
membrane potential 

Intracellular pH 



Cause 

(specific examp le) 

Stress, injury 
(isopropanol) 

Physiological 
changes 



25 



Induction of 



Increase in .nuu^iun < 

cytochrome-mediated microsomal 
oxidation dealkylases by 

polyaromatic 
hydrocarbons 
(naphthalene) 



Stain/ Substrate 
(class of chemifiai) 

BacLight stain 
(Semi-permeant 
nucleic acid stain) 

BCECF-AM 
(lipophilic 
acetoxymethyl ester 
of phenolic fluor) 

7-ethoxy- 

heptadecyl-coumarin 
(fluorogenic alkyl 
ether) 



5 ' 2 - 3 - PRE-SCREENTNG AND SCRYING OF thf T.THDm 
30 The combinatorial gene expression libraries of the 

invention may be pre-screened or screened by a variety of 
methods, including but not limited to, visual inspection, 
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automated image analysis, hybridization to molecular beacon 
DNA probes (Tyagi et al. 1996, Nature Biotechnol, 14:303-308) 
fluorescence activated cell sorting (FACS) and magnetic cell 
5 sorting (MACS) . Screening may be performed on bulk cultures 
of unamplified or amplified libraries. 

In specific embodiments of the invention, individual or 
pools of library cells are encapsulated in an inert, stable 
and porous semi-solid matrix in the form of droplets during 
pre-screening or screening. The semi-solid matrix is 
iq permeable to gas, liquid, as well as macromolecules , and 
permits the growth and division of encapsulated cells. 
Examples of suitable matrices may include but are not limited 
to agarose, alginate, and carrageenan. The encapsulated 
library cells may be cultured and tested in the droplets, and 
remain viable so that the cells may be recovered from the 
is droplets for further manipulations. The matrix may 

optionally be exposed to substances, such as an antibiotic 
which can select for library cells that contain a selectable 
marker. The droplets may also be exposed to nutrients to 
support the growth of library cells. The following examples 
are offered by way of illustration and are not intended to 
20 limit tne inve "tion in any manner. 

Encapsulation may be performed in one of many ways, 
producing either macrodroplets (droplets from 0.5 to 2.5 mm) 
or microdroplets (droplets from 10 to 250 M m) depending upon 
the method of detection employed during subsequent pre- 
screening or screening. The size and the composition of the 
^ droplets may be controlled during formation of the droplets. 
Preferably, each macrodroplet or microdroplet will contain 
one to five library cells. 

For example, macrodroplets may be prepared using sodium 
alginate as follows: sodium alginate is dissolved in 100 mL 
of sterile water at a concentration of 1% using an overhead 
3q mixer at 2000 rpm. A volume of library cells of E. coli or 
yeast, such as Schizosaccharomyces pombe and Saccharomyces 
species; or spores for Streptomyces species; Bacillus 
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subtilisj and filamentous fungus such as Aspergillus and 
Neurospora species; is added to the sodium alginate solution 
so that 1-5 cells are encapsulated per droplet. The mixture 
5 is allowed to sit for at least 30 minutes to degas, and is 
then extruded through any device that causes the formation of 
discrete droplets. One such device is a syringe with a 25 
gauge needle. The droplets are formed by adding the sodium 
alginate solution drop-wise into a beaker of gently stirring 
135 mM calcium chloride solution. Droplets are allowed to 
10 solidify for 10 minutes, and are then transferred to a 

sterile flask where the calcium chloride solution is removed 
and replaced with a suitable growth media. Encapsulated 
library cells can be grown under standard conditions. 

Microdroplets may be generated by any method or device 
that produces small droplets, such as but not limited to, 
15 two-fluid annular atomizer, an electrostatic droplet 

generator, a vibrating orifice system, and emulsif ication. 
Other methods for preparing semi-solid droplets are well 
known in the art; see for example, Weaver, U.S. patent 

4,399,219. 

The following example is a protocol for producing 
20 microdroplets using the emulsif ication technique (Monshipouri 
et al. 1995, j. Microencapsulation, 12:255-262). Using an 
overhead mixer at 2000 rpm, o.6g sodium polyphosphate and 2% 
sodium alginate are dissolved in 100 ml sterile water, and 
the alginate solution is allowed to degas for 60 minutes. An 
oil phase is prepared by mixing 300ml oil, such as canola or 
25 olive oil, with l.Og purified soy bean lecithin for at least 
30 minutes, a slurry containing i.9g calcium sulphate in 10 
ml 50% glycerol is prepared by sonication for at least 15 
minutes. This slurry and a volume of library cells which 
will yield 1-5 cells per droplet are blended into the 
alginate solution immediately before introduction to the oil 
30 phase. The emulsif ication process is initiated by slowly 
transferring the alginate mixture into the oil phase and 
mixing for 10 minutes at .580 rpm. 500 ml sterile water is 
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then added and the mixing allowed to continue for 5 minutes. 
Microdroplets can then be removed from the oil by 
centrifugation. The microdroplets are washed and resuspended 
s in a suitable growth media, ready for culture under standard 
conditions if required. The size of the droplets can be 
examined by phase microscopy. For the purpose of sorting by 
FACS or MACS, if the droplets are outside of the desired size 
range necessary for sorting, the droplets can be size 
selected using a filter membrane of the required size limit. 
io According to the invention, components of the reporter 

regimen or the target of a drug screen may also be co- 
encapsulated in a drop with library cell(s). Whole indicator 
cells or cellular fractions containing a bioassay, enzymes, 
or reporter molecules may be mixed with library cells 
suspended in the medium prior to formation of macro- or 
is micro-droplets as previously described. Compounds of 
— interest produced by the library cells may accumulate and 
diffuse within the droplet to reach the co-encapsulated 
indicator cells or reporter, and generate a signal. The co- 
encapsulated indicator cell may be a live target of the 
desirable compound, e.g. pathogens for anti-infectives, or 
2o tumor cells for anticancer agents. Any change in metabolic 
status of the indicator cells, such as death, or growth 
inhibition, constitutes a signal and may be detected within 
the droplet by a variety of methods known in the art. Such 
methods may include but are not. limited to the use of 
physiological probes, such as vital stains, or measurement of 
optical properties of the drop. 

When the droplets are exposed to components of the 
reporter regimen, metabolites and compounds produced by the 
encapsulated library cells and the reporter components may 
diffuse through the semi-solid medium to produce a signal. 
For example, a physiological probe may be added to a batch of 
3q droplets which are then subjected to the appropriate sorting 
format, if the library cell(s) are allowed to divide within 
the drop, the progeny of the original positive cell(s) are 
kept together in a microcolony, thereby generating a stronger 
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signal. It is preferable that the semi-solid medium is 
optically compatible with the signal generated by the 
reporter, e.g. transparent to light for a range of 
s wavelengths, so that the signal can be efficiently detected. 

Macrodroplets can be sorted using a colorigenic reporter 
either by screening by eye or by using any device that allows 
the droplets to pass through a screening point, and which has 
the capacity to segregate positives. Microdroplets can be 
sorted using either FACS or MACS. FACS services are performed 
by a qualified operator on any suitable machine (e.g. 
Becton-Dickinson FACStar Plus). Particle suspension 
densities (cells or droplets) are adjusted to 1 x 10 6 
particles/ml. In all cases, positives can be sorted directly 
into multi-well plates at l clone per well. MACS is 
performed using an MPC-M magnetic tube rack following the 
is manufacturer's instructions (Dynal, 5 Delaware Drive, Lake 
Success, New York 11042). 

Encapsulated cells which are found to be positive in a 
pre-screen or screen can be recovered by culturing the 
droplet by placing it either on appropriate agar or liquid 
growth media or by dissolving the droplet in sodium citrate. 
After a period of culturing, the positive cells may grow out 
of the droplet. For convenience in handling and storage of 
droplets, the subsequent culturing may be done in multi-well 
plates . 

Pre-screened positives which have been reduced to a 
smaller population can then either be frozen and stored in 
the presence of glycerol or grown in multi-well plates. 
These can be used to transfer groups of clones using multi- 
pin replicators onto various types of assay plates (e.g. 
differential media, selective media, antimicrobial or 
engineered assay lawns) . Specific assays can also be 
performed within these microtiter plates and read by a 
3q standard plate reader or any other format used in current 
high-throughput screening technologies. 
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For clarity of discussion, the following subsections 
describe in more detail the different embodiments of the 
invention involving prokaryotic and eukaryotic, donor and 
s host organisms. The following embodiments are exemplary and 
are not intended to be limiting. 

5-3. PROTOCOLS FOR THE PREPARATION OF HIGH 

QUALITY NUCLEIC ACTHfi fr OM nnwnp QRGANTSMfi 
The availability of high quality DNA or RNA as starting 
mat erxal is important in the construction of DNA libraries 
that are representative of the genetic information of the 
donor organisms. Methods for extracting, selecting and 
preparing high quality nucleic acids from cultures of donor 
orgamsms or from environmental samples are provided in this 
section, a method for preparing subtracted DNA probes to be 
used m pre-screening DNA libraries for the purpose of 
15 enriching DNA related to secondary metabolism is also 
described. 

5 ' 3 ' 1 * gg^gj™ ISOTHIOC YANATE NUCLEIC ACID 

Lyophilized or non-lyophilized material can be disrupted 
20 by passage though a mechanical grinder, or alternatively by 
hand in a mortar and pestle in the presence of fine ground 
glass or pumice. Immediately after grinding, ground 
lyophilized material may be mixed with io ml of lysis buffer 
per l- 2 g of material. Lysis buffer is 5M guanidine 
isothiocyanate, 50 mM Hepes pH 7.6, lOmM EDTA, and 5% 
25 3-mercaptoethanol (or 250 mM DTT) . After mixing and 

incubation at 50»C for 5 minutes, the solution is rendered to 
4% sarcosyl, mixed, and incubated for 5 minutes more at 50»C 
prior to centrifugation at 8000 g. if the supernatants are 
visibly cloudy a 90-minute centrifugation step at 27,000g may 
be used to sediment unwanted carbohydrates. Alternatively a 
30- 15,000g spin may be used to clear the lysate of unwanted ' 
contaminants. Following centrifugation, the supernatant is 
*ade up to 1.42M csci (o.isg CsCl/ml) and layered onto a 
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previously-made 5.7M CsCl/TE (10 mM Tris-HCL/ l mM EDTA) 
solution in ultracentrifugation tubes. Ultra-centrifugation 
can be carried out at I60,000g for 18 hours, 20°c. After 
s ultra-centrifugation, a clear, jelly-like layer at the 

1.42M/5.7M csci interface is DNA, while total cellular RNA is 
present as a clear pellet at the bottom of the tube. 

DNA from the ultra-centrifugation step can be dialyzed 
against TE buffer, rendered 0.1M NaCl, precipitated with 2.5 
volumes of ethanol, dried and redissolved in an appropriate 
M volume of TE. If the DNA layer is white in color, it can be 
removed and recentrifuged for 8 hours in a 
CsCl/bisbenzidimide gradient to remove remaining 
carbohydrates. The dye can be removed by 2-5 washes with 85% 
isopropanol, and the DNA dialyzed and treated as above. 

RNA can be redissolved in resuspension buffer (5M 
guanidine isothiocyanate, 50 mM Hepes, pH 7.6, 10 mM EDTA) 
diluted to 1.33M guanidine isothiocyanate with a solution of 
50 mM Hepes pH 7.6, 10 mM EDTA. if total RNA is desired, the 
diluted RNA sample is precipitated by the addition of 2 vol 
of ethanol or 1 vol of isopropanol. The precipitated RNA is 
rinsed with 70% ethanol, dried, and resuspended in water or 
formamide, and stored at -70°C until used. 

5 * 3 - 2 - ISOLATION OF POT,YfA)-CQNTATNTwrc mia 
Since the vast majority of eukaryotic mRNA molecules 
contain tracts of poly (adenylic) acid at the 3' end, up to 250 
bases in length, it can be purified by affinity 
chromatography using oligo-dT cellulose matrix. A wide 
variety of commercially available oligo-dT matrices may be 
used, including but not limited to, simple gravity columns, 
para-magnetic particles, spin and push columns. Isolated 
mRNA may be stored either dissolved in water, in formamide 
or dried at -70°C. 
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5.3.3. ENRICHMENT OF NON-RIBOSOMAL SEQUENCES FROM 
TOTAL RNA 

The enrichment of non-ribosomal sequences may be an 
essential step in obtaining useful RNA populations from 
5 difficult or uncultivable donor organisms. The 

fractionation of RNA on neutral sucrose gradients can be 
useful in purifying the predominant ribosomal RNAs away from 
other RNA species (R. McGookin 1984, In Methods in Molecular 
Biology Vol. 2 Nucleic Acids. Humana Press, pp. 109-112). 
Following centrifugation, the samples containing the largest 
10 amounts of ribosomal RNA can be discarded, and the remaining 
fractions dialyzed and precipitated. 

Other methods which utilize random primers with or 
without random-tailed oligo-dT primers and PCR may be used to 
amplify low amounts of RNA in starting material. 

15 5.3.4. FILL-IN REACTION USING THF KLENOW FRAGMENT 

The use of the Klenow fragment of E. coli DNA 
polymerase, or other DNA polymerase which lacks 3«-5« 
exonuclease activity, to add nucleotides to the 5- cohesive 
ends is a standard technique often used to create blunt ended 
DNA molecules after digestion. When used without a complete 
20 nucleotide set, such an activity can be exploited in creating 
ligation ends that are incompatible with themselves but 
compatible to each other. 

Such a technique has been used -to produce high-titer 
gene libraries and constructs (Hung et al. 1984, Nuc Acids 
Res 12:1863-1874; Zaborovsky et al. 1986, Gene 42:119; 
25 Foster, 1991, Ph.D. thesis, University of California, Santa 
Barbara; Loftus et al. 1992, Biotechniques 12:172-175.) 

The fill-in reaction can be carried out with Klenow 
buffer (50 mM Tris-HCl pH 7.5, 10 mM MgCl 2 , 50 mg/ml BSA, 1 mM 
dNTP), enzyme (lOU/50/il reaction), and an incubation of 3-4 
hours at 37°c. After the reaction, the DNA may be purified 
30 by a variety of methods, including but not limited to, 
affinity chromatography, ethanol precipitation, and spin- 
column centrifugation. 
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5.3.5. PROTOCOLS FOR PREPARATION OF SUBTRACTED DNA 
PROBES FOR PRE-SCREKNTMr: 

RNA may be isolated from young, mid log-phase cultures 
of organisms with complex life cycles that have not undergone 
5 differentiation. This RNA pool is complementary to genes 
involved in undifferentiated growth and primary metabolism. 
The RNA is biotinylated in vitro and hybridized in excess to 
randomly sheared, gene-sized fragments of genomic DNA from 
the homologous or closely related heterologous species. 
^ Phenol extraction of this mixture results in the removal of 
genomic sequences complementary to primary metabolism RNA at 
the interface. This process may be repeated once. The 
resulting single stranded DNA fragments are composed of the 
(+) strand of primary metabolism genes and the (+) and (-) 
strands of other genes, including secondary metabolism- 
related genes. This mixture of DNA is denatured, and 
rehybridized for 5-io half c 0 ts under highly stringent 
conditions such that only related sequences can rehybridize 
to form double-stranded DNA. The remaining single-stranded 
DNA can be removed by binding to hydroxy apatite or by 
digestion with mung bean nuclease. The isolated double- 
^ stranded DNA representing non-primary metabolism related 
genes may then be labeled using random priming, and used as 
probe to pre-screen a library. 

5.3.6. PURIFICATION OF NUCLEIC "ACIDS FROM SOIL OR 
OTHER MTYED ENVTRO NMENTAT. SAJjPJ jBS __ 

Soil samples are flash frozen in liquid nitrogen and 
25 stored at -70°C until processed. Alternatively, soil samples 
are stored frozen at -20°C. Samples are either thawed on ice 
immediately prior to use, or freeze-dried prior to 
processing. 

Total nucleic acids are extracted by a number of 
protocols with minor modifications depending on the physical 
30 state and source of the material. Dry to semi-dry samples 
are frozen and processed directly; very wet samples are flash 
frozen and freeze-dried; oily samples are diluted with 
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phosphate buffered saline prior to processing. Any of the 
following procedures may be adapted: Ogram et al. 1987, J. 
Microbiol. Meth. 7:57-66; Steffan et al. 1988, Appl. Environ. 
s Microbiology, 54:137-161; Werner et al. 1992, J. of Bact. 
174(15): 5072-5078; Zhou et al. 1996, Appl. Environmental 
Microbiol. 62 (2) : 316-322 . 

Briefly, 5 g samples are lysed directly by dropwise 
addition to hot guanidium isothiocyanate lysis buffer (see 
Section 5.3.1), and subjected to a cesium chloride 
iq purification. Alternatively, the samples are mixed with 13.5 
ml of DNA extraction buffer (100 mM Tris-HCl pH 8.0, lOOmM 
EDTA, 100 mM sodium phosphate, 1.5 mM NaCl, 1% CTAB 
(hexadecylmethylammonium bromide) and 100 ul of 20 mg/ml 
proteinase Kin 50 ml centrifuge tubes and shaken by 
horizontal shaking at 225 rpm for 30 minutes at 37°C. After 
is shaking, 1.5 ml of 20% SDS is added, and the samples 
incubated at 65°C for 2 hours, with end-over-end shaking 
every 15-20 minutes. The supernatants are collected by 
centrifugation at 6000 x g for 10 minutes at 20°c. The 
pellets are re-extracted 3X by adding 4.5 ml of extraction 
buffer and 0.5 ml of 20% SDS, vortexing for 1 minute, 
20 followed by a 10 minute incubation at 65°C and re- 

centrifugation. Pooled supernatants from 3 extractions are 
extracted twice with chloroform-isoamyl alcohol (48:1). The 
nucleic acids are precipitated by the addition of 0.6 volumes 
of isopropanol followed by a one hour incubation and 
centrifugation at 16,000 x g for 20 minutes at room 
2s temperature. The crude nucleic acid pellets are then 
resuspended in 10 mM Tris-HCl pH 8.0, 2 mM EDTA. Further 
purification of the DNA is by DEAE chromatography if needed. 
Total RNA is obtained from the crude pellet by selective 
precipitation of RNA by 4M lithium acetate or acid phenol 
extraction (Ausubel et al. 1990, Greene Publishing Associates 
ao and Wiley Interscience, New York; Hoben et al. 1988, Appl., 
Environ, Microbiology, 54:703-71). 
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5.3.7. REPAIR OF nw^ 
Nicked or degraded DNA samples are repaired by 
first blunting any fragmented ends with T4 DNA polymerase 
s (New England Biolabs) . The DNA is treated in blunting buffer 
(50 mM Tris-HCl pH 7.8, 10 mM MgC12 , 40 (M dNTPs, 5 U/10 fig 
T4 DNA polymerase) for 1-2 hours at 37°C. The DNA is ethanol 
precipitated by the addition of l/io volume of 3M sodium 
acetate and 2.5 volumes of 100% ethanol. 

After centrifugation and resuspension in water, the DNA 
^ sample is treated with E. coli DNA ligase in E. coli ligase 
buffer (50 mM Tris-HCl pH 7.8, 10 mM MgC12, 10 mM DTT, 26 uM 
NAD+, and 25 uM BSA, 10U of E. coli for 1-2 hours at 16 °C. 
After treatment the DNA sample is diluted 5 fold with a 
solution of 20 mM Tris-HCl pH 8.0, 0.3M sodium acetate and 
extracted once with phenol and once with chloroform The 
15 addition of 2.5 volumes of ethanol to the aqueous phase 

precipitates the DNA. The samples are rinsed two times with 
70% ethanol and resuspended in sterile water or 10 mM Tris- 
HCl, pH 8.0, 1 mM EDTA and frozen at -70°C until used. 

5 ' 4 - PROTOCOLS FOR PROKAR YOT I C EYPPESSION LIBRAPTFfi 

20 The procedures for preparing natural pathway expression 

libraries and chimeric pathway expression libraries using 
prokaryotic host and donor organisms are provided in this 
section. Purified high quality DNA obtained by the 
techniques described in Sections 5.3.1 - 5.3.4 may be used in 
the following procedures. 

25 

5.4.1. BACTERIAL SPECIES, STRAINS, AND CULTURE 
CONDITIONS 

Particularly good expression host organisms are 
restriction-minus, endonuclease deficient, and recombination 
deficient. For E. coli, a preferred strain is XL1-MR 
30 (genotype: Mcr A- , McrCB- , McrF-, Mrr-, hsdr-, endal-, recA-) . 
For Streptomyces , a preferred strain is s. lividans TK64. 
For Bacillus subtilis, preferred strains are B. subtilis 
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PB168 tr P C2; B. subtilis PB5002 sacA, degUhy; B. subtilis 
PB168delta trpC2, pksdelta 75.8; B. subtilis ATCC 39320 and 
39374. 

S The donor organisms are bacterial species. Some are 

selected for the ability to produce a unique compound that is 
detectable by current assays, others are selected due to 
their presence in an environmental sample of potential 
interest. In some examples, marine bacteria were obtained 
from Harbor Branch Oceanographic Institute and Scripps 
10 Institute of Oceanography. They were generally collected 
from international waters more than 200 miles offshore. 
Metabolic tests as well as gram testing and colony 
morphologies were performed to the level necessary to ensure 
that the samples are taxonomically diverse. 

E. coli are grown at 37°c when preparing library stocks, 
15 and at 30°C for expression. Marine, Actinomyces and 
Streptomyces species are grown only at 30°C. 



5 * 4 - 2 - PREPARATI ON OF DOtfQ R GENOMTt? nu& 

From each species of bacteria, a lOmL culture is grown. 

2Q The bacteria are pelleted by centrifugation and resuspended 
in lOmM Tris, 5mm EDTA (TE) . The DNA may be purified by the 
procedures described in Section 5.1.2., or the bacterial 
pellet may be solubilized in SDS/proteinase K, extracted by 
phenol: chloroform, and precipitated with isopropanol. The 
resulting purified DNA is resuspended overnight in TE. 

25 Aliquots of each purified DNA are subjected to agarose 

gel electrophoresis to confirm integrity and to determine the 
DNA concentration. 

To prepare random large DNA fragments for the natural 
pathway expression library, 20pg of DNA for each species is 
partially digested with a freguent-cutting enzyme, such as 
^ Sau3A, by incubating in IX enzyme buffer and 0.01-0.5 unit 
enzyme per Mg DNA for 1 hour at 37»C. The amount of enzyme 
used may be determined empirically to generate the desired 
size range. The digested DNAs are pooled, phenol: chloroform 
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extracted, and ethanol-precipitated. 100/ig of this mixture is 
used for each library that requires large native fragments of 
genomic DNA. This mixture can optionally be size- 
fractionated through sucrose gradients. Smaller fragments of 
DNA for the chimeric pathway expression library can 
simultaneously be selected by size fractionation. 

The digestion and size fractionation are confirmed by 
subjecting aliquots of the samples to agarose gel 
electrophoresis . 

10 

5.4.3. GENERATION OF PROKARYOTIC PROMOTER FRAGMENTS 
In one example, synthetic oligonucleotides are used to 
construct a fragment containing two copies of the beta- 
galactosidase promoter (lac) , one on either side of a unique 
BamHl site, with each copy of lac positioned to direct 

15 transcription toward the centered BamHl site (Figure 4 A) . 
The synthetic oligonucleotides are phosphorylated by the 
synthesizer. 400ng of each oligonucleotide is annealed by 
boiling five minutes and slow cooling over 30 minutes to 25°c 
before ligating 30 minutes at room temperature with T4 DNA 
ligase. The ligation mix is subjected to agarose gel 

20 electrophoresis and 2-7 kbp fragments are excised and 

purified by Gene Clean. The joined, paired, and properly- 
oriented cassettes are inserted into the Smal site of the 
pBSK plasmid vector by incubation for 16 hours at 15 °C with 
T4 DNA ligase in IX ligase/PEG buffer. The ligation mix is 
introduced into XL1-MR cells. Individual clones are analyzed 

25 by restriction enzyme analysis and may optionally be 
sequenced to confirm orientation and accuracy. 

The pBSK- (lac/ lac) n clones (where n is an integer from 2 
to 10) are cultured in 0.3 liter quantities and the plasmids 
purified using a plasmid preparation kit (Qiagen) . 40/ig of 
the selected and purified pBSK- (lac-lac) n is digested to 

30 

completion with Smal in IX buffer. The digested DNA is 
subjected to agarose gel electrophoresis and the lac/lac 
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promoter dimers are excised and purified with Gene Clean, and 
digested to completion with BaiaRl in IX buffer. See Figure 

4B and 4C. The digested promoter monomers are 
phenol: chloroform extracted, ethanol precipitated, and 
dephosphorylated by treatment with CIAP in IX CIAP buffer. 
The dephosphorylated, digested promoters are extracted and 
precipitated as before, and resuspended in TE at a 
concentration of 20ng/A*l before storing at -20°C or further 
use. 

iQ In another example, prepared promoter fragments are 

mixed with similarly -prepared linkers that do not contain 
promoter sequences, and then used in ligations with the donor 
genomic DNA. This allows the generation of cassettes with 
only one promoter , in cases where anti-sense transcription is 
a consideration. 

15 

5.4.4. PREPARATION OF GENE CASSETTES FOR 

COMBINATORIAL CHIMERIC PATHWAY EXPRESSION 
LIBRARIES 

In one example, BamHI-BamHI fragments of genomic DNA 

(mean size 3.5 kbp) are mixed with an excess of 
dephosphorylated promoter fragments, and then ligated. The 
20 molar ratio of promoters to genomic DNA fragments is 20:1. 
The resulting units (lac / genomic DNA fragment/ lac) will 

have a mean size of approximately 4 kbp. Other prokaryotic 
promoters that may be used include other E. coli promoters 

(Harley et al., 1987, Nuc Acid Res 15:2343-2361), and 
Streptomyces promoters (Strohl 1992, Nuc Acid Res V20:961- 

25 

974) for use in Streptomyces species expression hosts. In 

hosts with undetermined or significant recombination ability, 
it is desirable to use a series of different promoters such 
that any clone containing several cassettes will contain 
several different promoters. 
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5.4.5. PREPARATION OF SOLID SUPPORT 
Ultralink Immobilized Streptavidin beads were purchased 
from Pierce (Cat. No. 53113). 3M Emphaze Biosupport Medium 
AB1 "blank beads" was purchased from Pierce (Cat. No. 53112). 
Similar solid supports from other vendors may be substituted 
for this procedure. 

Oligonucleotides were purchased from Life Technologies 
(Gibco-BRL) . Oligonucleotide "Bead-link-5" is 5" biotin-GCC 
GAC CAT TTA AAT CGG TTA AT 3 1 . "Bead-link-3 " is 5' 
1Q phosphate-TAA CCG ATT TAA ATG GTC GGC 3». When annealed, 
these oligonucleotides contain a Swal restriction 
endonuclease site (shown underlined below) . Annealed bead- 
link oligonucleotides also leave an AT overhang at the 3' 
end. This overhang is shown by bolding on oligonucleotide 
bead-link-5. 
15 biotin-GCC GAC CAT TTA AAT CGG TTA AT 
CGG CTG G TA AAT TTA GCC AAT 
Equimolar amounts of each bead-link oligonucleotide are 
mixed together in an eppendorf tube. 5M NaCl is added to the 
tube to a final concentration of 300mM. The reaction is 
incubated at 60°C for 1.5 hr. Annealing was confirmed by 
20 agarose gel electrophoresis using non-annealed 
oligonucleotides as a control. 

To prepare blank beads, lOOmg dry beads was resuspended 
in 1ml phosphate buffered saline (PBS) . Bovine Serum Albumin 
(BSA) was added to final concentration of Img/ml. Beads were 
rotated for 4 hrs at room temperature. Beads were pelleted 
25 by centrifugation and washed 3x with 1M Tris-HCl pH8.0 for 2 
hours at room temperature to block unreacted azalactone 
sites. Beads were pelleted by brief centrifugation and were 
washed extensively with PBS. Blank beads were stored in PBS 
at 4°c until used. 

To bind bead-link oligonucleotide to streptavidin beads 
30 lO^g previously-annealed oligonucleotides were mixed with 
20^1 Ultralink Immobilized Streptavidin beads in lx binding 
buffer (PBS, 500mM NaCl) . Beads were incubated for three 
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hours at room temperature with inversion to keep the beads 
suspended. Beads are pelleted and washed 3x with 1ml binding 
buffer. Beads are then washed and equilibrated with lx 
ligation buffer (50mM Tris-HCl pH7.8, lOmM MgCl2, lOmM 
dithiothreitol, lmM ATP, 25/ig/ml BSA) . Beads are stored at 
4°C until used. 

5.4.6. ASSEMBLY OF A COMBINATORIAL CHIMERIC 
PATHWAY EXPRESSION LIBRARY 

Attachment of gene cassettes to magnetic beads: The gene 
10 cassettes are phosphorylated using T4 polynucleotide kinase 
in IX kinase buffer. The phosphorylated fragments are 
ethanol precipitated and resuspended in TE. l/io of this is 
ligated to a mixture of two short non-phosphorylated 
synthetic linkers. The remaining 9/10 is used for a later 
procedure. Each linker will have one of two rare-cutting 
15 enzymes, either Notl or Srfi. in addition, the Notl- 

containing linker is biotinylated at the time of synthesis of 
the oligonucleotides. The Notl and Srfi linkers are mixed 
with the phosphorylated transcription units in the ratio, 
respectively, of 100:100:1, and ligated with T4 DNA ligase in 
2q IX ligase/PEG buffer for 16 hours at 15 °C. This mixture is 
allowed to bind to avidin-conjugated MPG magnetic beads, and 
the manufacturer's protocols are used to remove the bead - 
bound transcription units from the ligation mixture. 

In the mixture of ligated DNA, approximately 1/2 will 
have a biotinylated Notl linker placed at one end and a Srfi 
2 5 linker at the other end. The Notl ends will be bound to the 
beads by avidin-biotin linkages. The fragments with Notl 
linkers at both ends are not involved in further addition 
steps. The fragments with Srfi linkers at both ends are not 
retained in the magnetic separation step. 

Preparation of pool of DNA for addition to beadbound 
30 DNA: The remaining 9/10 of the phosphorylated transcription 
units are ligated as above, but to the Srfi linkers only, 
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followed by digestion to completion with Srfi, 
dephosphorylation, purification and ethanol precipitation. 
De-protection of bead-bound DNA: Transcription units 
5 bound to the beads are digested to completion with the Srfl 
enzyme in IX Srfl buffer. The reaction is heat-inactivated 
and the beads are removed by magnetic separation. 

Concatenation: The beads are then added to a ligation 
mix containing the dephosphorylated Srfl-srfl digested 
transcription units in IX ligation buffer. Ligations are 
10 commenced by addition of T4 DNA ligase and proceed for 60 
minutes, 25 °C, before heat-inactivation of the ligase and 
magnetic separation of the beads. Ligations will primarily 
occur between phosphorylated bead-bound DNA and non- 
phosphorylated transcription units. The transcription units 
on the bead are phosphorylated by T4 polynucleotide kinase, 
heat-inactivated, magnetically-separated, and returned to the 
ligation mixture with the addition of more T4 DNA ligase. 

This cycle is repeated ten times before cleaving the 
polymer from the beads by digestion with Notl. The cleaved 
DNA is ethanol precipitated, resuspended in TE, and viewed on 
^ an agarose gel to gauge the guality and size range before 
insertion into the SuperCos 1 or other vector, according to 
the expression host. The concatemers are used to generate a 
prokaryotic library in the relevant expression host as 
described in Section 5.4.5. 



5.4.7. ASSEMBLY OF A COMBINATORIAL NATURAL PATHWAY 
25 EXPRESSION LIBRARV 

The expression vector for an E. coli library is 
desirably the cosmid SuperCos 1, capable of maintaining 
inserts of 30-42kbp in size. Insertion of the DNA fragments 
into SuperCos 1 and packaging with Gigapack extracts are 
performed according to the manufacturer's directions 
(Stratagene) . 

Briefly, XLl-MR host cells are infected with SuperCos l 
phage containing the DNA library. This is performed as 



30 
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10 



follows: XLl-MR cells are grown overnight in 5mL LB medium 
with 1% maltose, lOmm MgSO, at 300 rpm, 37°C. The overnight 
culture is diluted mo and cultured 3 hours in LB/iOmM MgSO, 
at 300 rpm, 37 °c. The culture is pelleted by centrifugation 
at 800xg and resuspended in 5mL LB. 600/xl of this suspension 
is incubated with 500cfu of library packaged in phage 
particles for 30 minutes at ambient temperature, followed by 
a 60 minute incubation with 8 vol LB at 300 rpm, 37°c. 

In order to amplify the expression libraries, the 
infected host cells are spread on 150mm Petri dishes with 
50mL LB, 50 M g/mL ampicillin. The plates are previously dried 
for 48 hours at ambient temperature. After spreading, the 
plates are allowed to incubate overnight at 37 °C. The plates 
are scraped and the colonies resuspended with 3mL 
15% glycerol, 85% LB per plate. This bacterial suspension is 
stored at -70°C for further use. 

To prepare the libraries for screening individual 
clones, the infected host cells are spread on 150mm Petri 
dishes with 50mL LB, 50mg/mL ampicillin. The plates are 
previously dried for 48 hours at ambient temperature. After 
spreading, the plates are allowed to incubate overnight at 
2Q 37°C. Resulting colonies are picked with sterile toothpicks 
and transferred one per well to multi-well plates. Each well 
of a 384-well plate contains 75//!L LB, 50 M g/mL ampicillin, 7% 
glycerol. The outer rows (80 wells total) are not inoculated 
but are similarly filled with medium to provide an 
evaporation barrier during subsequent incubation and 
2s freezing. These inoculated master plates are placed at 37 °c 
for 16 hours without shaking. The overnight master 384-well 
plates are used as a source plate to replicate into one or 
more working 384-well plates or Omni-Trays. The master 384- 
well plates are then sealed individually and frozen at -80°c. 
Replication is done with a 384-pin replicator. Before and 
3q after each use, the 384-pin replicator is dipped sequentially 
into bleach for 20 seconds, water for 30 seconds, then 
ethanol for 5 seconds before flaming. Methods of library 
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assembly are dependent on the selection of vector and 
expression host. 

5 5.4.8. PRE-SCREENING OF EXPRESSION LIBRARIES 

There are three categories of pre-screens: intracellular 
differential, and selection. 

Briefly, the first category, intracellular pre-screening 
entails introduction of the library into a host engineered to 
contain a chemo-responsive reporter construct. The reporter 
is GFP (green fluorescent protein) or [J-galactosidase, and 

10 

selection is done by fluorescence-activated cell sorting 
(FACS) or macrodroplet sorting. 

The second category, differential pre-screening, entails 
incubation of the library in the host with fluorescent or 
chromogenic physiological tracers, followed by FACS or 
macrodroplet sorting. 

15 

The third category, selection pre-screening, entails 
incubation of the library in the host with selective agents 
such as antibiotics, followed by FACS or macrodroplet sorting 
to identify surviving or multiplying cells. 

For all methods, cell sorting is done on bulk cultures 
of amplified libraries prior to examination of individual 
cultures . 

The libraries may be pre-screened by FACS or 
macrodroplet sorting. Pools of host cells containing the DNA 
libraries are cultured in one of two formats promoting either 
high or low density micro-environments. 

In the first format, cells of the amplified library are 
examined as individual cells. An E. coli library aliquot is 
grown for 4 hours at 30°C in 20 vol medium at 300 rpm before 
pelleting, resuspension in 1 vol sterile ddH 2 0, incubation 
with fluorescent probes (as needed) , and placement on ice for 
transfer to the FACS facilities. 
3Q In the second format, aliquots of the amplified library 

are encapsulated and cultured in the presence of substrates 
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or selection agents as described in Section 5.2.3 before 
transfer to the FACS or macrodroplet sorting facilities. 

For cultures to be examined with fluorescent tracers or 
substrates, the cultures resuspended in ddH 2 0, are stained 
before FACS following the manufacturers protocols, typically 
as follows: incubations are in the dark, at room temperature, 
for 15 minutes, followed by pelleting for 5 minutes in a 
1.5mL microfuge tube and resuspension in 1 vol cold ddH 2 0. 

After sorting, pools of selected 1-1000 clones or 
macrodroplet s from the expression libraries are cultured in 
0.5L nutrient media. The cultured bacteria and media are 
processed for chemical analysis by extraction with 0.5L ethyl 
acetate. Rotary evaporation yields a crude organic extract 
of approximately 20mg-lg extract per liter culture. The 
cognate cloned DNAs are purified and re-transformed into host 
cells to confirm the localization of relevant sequences to 
the cosmid. Chemical samples generated by expression from 
library clones may be examined by HPLC using a series of 
columns (cat ionic, anionic, reverse phase) and subsequently 
by qualitative chemical analysis using NMR. 

5.4.9. METABOLIC TESTING OF MARINE GRAM(-) /JE. COLI 
LIBRARY BY PLATE REPLICATION 

Each wild-type marine species is tested prior to 
preparation of the DNA libraries to prevent redundancy and to 
help determine the array of metabolic tests to be done on the 
completed libraries. 

To prepare the libraries for screening individual 
clones, the infected host cells, such as E. coli XL1-MR, are 
spread on 150mm Petri dishes with 50 ml LB, 50mg/ml 
ampicillin. The plates are previously dried for 48 hours at 
ambient temperature. After spreading, the plates are allowed 
to incubate overnight at 37°c. Resulting colonies are picked 
with sterile toothpicks and transferred one per well to 
384-well plates. Each well contains 75 txl LB, 50 ixq/ml 
ampicillin, 7% glycerol. The outer rows (80 wells total) are 
not inoculated but are similarly filled with medium to 



- 107 - 



WO 00/52180 PCT/US00/05707 



provide an evaporation barrier during subsequent incubation 
and freezing. These inoculated master plates are placed at 
37°C for 16 hours without shaking. The overnight master 
s 384-well plates are used as a source plate to replicate into 
one or more working multi-well plates or Omni-Trays. The 
master 384-well plates are then sealed individually and 
frozen at -80°C. Replication is done with a 384-pin 
replicator. Before and after each use, the 384-pin 
replicator is dipped sequentially into bleach for 20 seconds, 
water for 30 seconds, then ethanol for 5 seconds before 
flaming. 

Working multi-well plates or Omni-Trays are used as 
source plates to replicate the DNA libraries onto a series of 
differential and/or selective media (e.g. siderophore 
detection media or antimicrobial lawns) . The results are 
^ compiled and compared to the profiles of the wild-type marine 
bacteria used to construct the DNA library. 

5.4.10. METABOLIC TESTING OF MARINE GRAM(-) /E. COLI 
LIBRARY By MACRODROPT. ET ENCAPRTTLATION 

Clones are encapsulated by taking sodium alginate and 
^ dissolving in 100 mL of sterile water at a concentration of 
1% using an overhead mixer at 2000 rpm. a volume of library 
suspension is added so as to embed 1-5 clones per droplet. 
The mixture is allowed to sit for at least 30 minutes to 
degas. The mixture is then extruded through any device that 
allows it to form individual droplets. One such example is a 
syringe with a 25 gauge needle. These are dropped into a 
gently stirring beaker of l35mM calcium chloride. Droplets 
are allowed to harden for 10 minutes and then are transferred 
to a sterile flask and the calcium chloride removed and 
replaced with LB/Amp media and a substrate (e.g. 
x-glucosidamine) . Flasks containing the droplets are then 
shaken at 30°C overnight and examined the following morning 
for positive clones indicated by the presence of blue 
colonies . 
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Droplets are placed in a single layer in a large clear 
tray and scanned by eye. Positive colonies are removed and 
placed in 96-well master plates containing LB/ Amp and 50 mM 
sodium citrate pH 7.4 to dissolve the droplet, and allowed to 
grow at 37°C overnight. These overnight master 96-well plates 
are used as a source plate to replicate into one or more 
working multi-well plates or Omni-Trays. The master 96-well 
plates are then sealed individually and frozen at -80°C. 
Positive clones can then be either sent for specific testing 
of the products or sent through another round of 
pre-screening or screening. Further screening may be 
performed by replication which is done with a multi-pin 
replicator. Before and after each use, the multi-pin 
replicator is dipped seguentially into bleach for 20 seconds, 
water for 30 seconds, then ethanol for 5 seconds before 
flaming. 



5.4.11. METABOLIC TESTING OF MARINE GRAM (-)/£. COLI 
LIBRARIE S BY MICRODRQPLET ENCAPSULATION 

Microdroplets may be generated by the following method. 
Using an overhead mixer at 2000 rpm, 0.6g sodium 
polyphosphate and 2% Sodium alginate are dissolved in 100 ml 

20 sterile water. This mixture is allowed to degas for 60 
minutes. Then 1.9g calcium sulphate is sonicated in 10 ml 
50% glycerol for at least 15 minutes. This slurry and a 
volume of the library suspension which will yield 1-5 cells 
per droplet are blended into the alginate solution 
immediately before introduction to an oil phase (olive oil) 

25 which has been premixed with the addition of l.Og purified 
soy bean lecithin for at least 30 minutes. The 
emulsification process is initiated by slowly transferring 
the alginate mixture into the oil phase and mixing for 10 
minutes at 580 rpm. 500 ml sterile water is then added and 
the mixing allowed to continue for 5 minutes. Microdroplets 

30 can then be removed from the oil by centrifugation and washed 
and resuspended in LB/ Amp. For the purpose of sorting by 
FACS, if the droplets are outside of the desired size range 
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necessary for sorting, the droplets can be size selected 
using a filter membrane of the reguired size limit. Clones 
can then be grown 2 hours at 30°C with shaking in LB/ Amp 
media containing a fluorescent substrate. 

Following incubation the sample is prepared for sorting 
with FACS by centrif uging, washing and resuspending in 
sterile water at a density of 1 X 10 6 droplets per ml. The 
size of the droplets can be examined by phase microscopy. 
FACS services are performed by a qualified operator on a 

^ Becton-Dickinson FACStar Plus and positives are sorted 

directly into multi-well plates containing LB/Amp, isolating 
positives to l clone per well. These plates are allowed to 
grow at 37°c until the colonies grow out of the beads (1-2 
days) . These overnight plates are used as a source plate to 
replicate into one or more working multi-well plates or 

^ Omni-Trays. The master multi-well plates are then sealed 
individually and frozen at -80°C. Positive clones can then 
be either sent for specific testing of the products or sent 
through another round of pre-screening or screening. Further 
screening may be performed by replication which is done with 
a 96 or 384-pin replicator. Before and after each use, the 

20 replicator is di PPed sequentially into bleach for 20 seconds, 
water for 30 seconds, then ethanol for 5 seconds before 
flaming. 

5.4.12. METABOLIC TESTING OF 

ACTINOMY CETES /STREPTOMY CBS LIVIDANS LIBRARY 
BY PLATE REPLICATION 

25 Each cultivable wild-type actinomycete species is tested 

prior to preparation of the DNA libraries to prevent 

taxonomic redundancy, and to help determine the array of 

metabolic tests to be done on the completed libraries. To 

prepare the libraries for screening individual clones, the 

transformed host cells, Streptomyces lividans TK66, are 

30 

spread on 150mm Petri dishes with FlOA. The plates are 
previously dried for 48 hours at ambient temperature. After 
spreading, the plates are allowed to incubate overnight at 
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30°c. Selection is initiated by overlaying with 
thiostrepton. Resulting colonies are picked with sterile 
toothpicks and transferred one per well to 96-well plates. 
Each well contains F10A media. These inoculated master 
plates are placed at 30°C for 1-4 days. The overnight master 
96-well plates are used as a source plate to replicate into 
one or more working multi-well plates or Omni-Trays. The 
master 96- well plates are then sealed individually and 
frozen at -80°C. Replication is done with a multi-pin 
replicator. Before and after each use, the multi-pin 
replicator is dipped sequentially into bleach for 20 seconds, 
water for 30 seconds, then ethanol for 5 seconds before 
flaming. 

Working multi-well plates or Omni-Trays are used as 
source plates to replicate the DNA libraries onto a series of 
differential and/or selective media (e.g. antibiotic plates 
or antimicrobial lawns) . The results are compiled and 
compared to the profiles of the wild-type bacteria used to 
construct the DNA library. 

5.4.13. METABOLIC TESTING OF 

ACTINOMYCETES / STREPTOMYCES LIVIDANS LIBRARY 
20 BY MACRO DRQPLET ENCAPSUT.ATTnM 

Clones are encapsulated by the method as described in 
Section 5.4.10 for E. coli libraries. Droplets are allowed 
to harden for 10 minutes and then are transferred to a 
sterile flask and the calcium chloride removed and replaced 
with F10A media and a substrate (e.g. x-gal) . Flasks 
25 containing the droplets are then shaken at 30°c for l -5 days 
and examined for positive clones indicated by the presence of 
blue colonies. 

Droplets are placed in a single layer in a large clear 
tray and scanned by eye. Positive colonies are removed and 
placed in 96-well master plates containing F10A 50 mM sodium 
30 citrate pH 7.4 to dissolve the droplets and then grown at 
30°C for 2 days. These overnight master 96-well plates are 
used as a source plate to replicate into one or more working 
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multi-well plates or Omni-Trays. The master 96-well plates 
are then sealed individually and frozen at -80°C Positive 
clones can then be either sent for specific testing of the 
products or sent through another round of pre-screening or 
screening. Further screening may be performed by replication 
as described above in Section 5.4.9. 

5.4.14. PRE-SCREENING OF CLONES BY CO-ENCAPSULATION 
WITH INDICATOR CELLS 

Pools of library clones are titered by plating 
10 appropriate dilutions and performing colony counts. Adequate 
library cells are mixed in 1% alginate to result in 
approximately 1 cell per macrodroplet • In addition, adequate 
indicator cells are included to result in approximately 50 
target cells per droplet. Macrodroplets are produced as 
described in Section 5.4.10, and cultured under appropriate 
15 conditions for the library and indicator cells. 

In general, S. lividans library macrodroplets are 
cultured at 30°C in R5 or F10A, and E. coli library 
macrodroplets are cultured at 30-37°c in LB or B3. The media 
and temperature may be adjusted to accommodate the 
physiological needs of the indicator cells. To visualize 

20 

effects of the library cell has on the indicator cells, the 
following reporter regimens are utilized: to detect cell 
death, inclusion of neutral red or congo red; to detect cell 
viability, inclusion of substrate relevant to indicator cell 
(e.g., X-glucopyranoside for E. faecal is) ; to detect B- 

25 9&lactosidase reporter activity in response to promoter 
activation, inclusion of 80 mg/ml X-gal in culture media. 
After isolation of positive macrodroplets as described in 
Section 5.4.10, indicator cells are eliminated by addition of 
antibiotics that are selective for the library cells but not 
the indicator cells. The library cells are then stored 

30 and/ or further examined as desired. 
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5.5. PROTOCOLS F OR EUKARYOTIC EXPRESSION LIBRARIES 
This section describes procedures that may be generally 
applied to prepare combinatorial gene expression libraries of 
eukaryotic donor organisms. The steps involved in the 
5 preparation of a combinatorial chimeric pathway gene 

expression library in eukaryotes are shown in Figures 5A-5G. 

Particularly good expression eukaryotic host organisms 
are stable, non-filamentous , and characterized sufficiently 
so as to be genetically manipulatable for the purposes of 
gene expression. For yeast and fungi, a preferred species is 

10 

S. pombe f which is grown at 30°C (C. Guthrie and G.R. Funk, 

Guide to Yeast Genetics and Molecular Biology, Methods in 
Enzymology, Vol. 194, Academic Press). A. thaliana and N • 

tabacum cells are preferred hosts (CP. Lichtenstein & J. 

Draper, Genetic Engineering of Plants, DNA Cloning Vol. II, 
15 pp. 67-119) . 



5.5.1. REMOVAL OF SATELLITE GENOMIC DNA BY DENSITY 
GRADIENT CENTRI FUGATION 

Eukaryotic genomes often have large amounts of 

repetitive DNA which consists of primarily ribosomal coding 

regions, or sequences of no apparent function. Thus, in 

preparing genomic DNA from eukaryotic donor organisms, it may 

be desirable to exclude such non-coding DNA sequences from a 

library. Standard CsCl genomic DNA purification methods in 

the presence of the DNA binding dye, Hoechst 33258 (Cooney & 

Matthews, 1984) may be used to separate out various classes 

of genomic DNA prior to cloning. 

5.5.2. GENERATION OF EUKARYOTIC PROMOTERS AND 
TERMINATOR FRAGMENTS 

Both promoter and terminator gene fragments may be 

produced by PCR using sequence-specific primers adapted from 

published sequences of known promoters and terminators. The 

choice of promoter and terminator sequences can be determined 

by the host organism used. For instance if 5. pombe is used 
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as an expression host, both native promoters, such as nmt 1 
or ura 4, and non-native promoters such as those derived from 
viruses, e.g., CMV, SV40 (Forsburg, 1993 Nuc Acid Res. 

5 8:4321-4325), or from humans e.g., chorionic gonadotropin or 
somatostatin (R. Toyama, H. Okayama 1990, FEBS Letters 268(1) 
pp. 217-221) . Genetically engineered promoters similar to 
those found in the inducible tetracycline system (Faryar et 
al. 1992, Curr Genet 21:345-349) may also be used. 

PGR reactions may be performed in a commercially 

0 available PCR machine using standard PCR reaction conditions 
and DNA polymerases of high fidelity and throughput, such as 
but not limited to, Pfu polymerase (Stratagene) or Vent 
polymerase (New England Biolabs) . Since not all primer sets 
will use the same reaction conditions, precise conditions may 
be determined empirically by techniques known in the art. 
PCR oligonucleotide primers maybe obtained commercially or 
synthesized by methods well known in the art. 

The promoter and terminator fragments generated by PCR 
may comprise restriction sites at the 5' ends. Bgl II, Xho 
I, and BamHI are used herein to illustrate the principle of 
the invention. Any restriction sites may be used as long as 
the site does not appear within the promoter or terminator 
gene sequences. 

To generate cloning sites compatible to cDNA or genomic 
DNA inserts,- cleavage of the promoter gene fragments with Bgl 
II and Xho I will generate promoter gene fragments which have 
at their 5' ends a Bgl II site and an Xho I site at their 3' 
ends. Terminators are cut only with Xho I and will have only 
an Xho I site at their 5' end. 5 1 and 3' orientations are 
based on the expected direction of transcription across the 
promoter or terminator gene fragment. See Figure 5B. 

Partial fill-in reactions utilizing the large subunit of 
E. coli DNA polymerase I (Klenow fragment) and a subset of 
deoxynucleotides (in this case dCTP and dTTP) may be used to 
generate promoter and terminator fragments that are incapable 
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of self -ligation by their Xho I ends. The Bgl II ends of the 
promoter fragments cannot be affected because of the lack of 
base-complementarity, and the BamHI end of the terminator 

5 fragments have no exposed 5' end for the Klenow fragment to 
utilize. 

Treatment with a phosphatase, such as calf intestine 
alkaline phosphatase, will prevent Bgllls self -ligations, and 
provide similar termini for ligations in both the promoter 
and terminator fragments. cDNA fragments are protected from 
10 di 9 estion with NotI by incorporation of 5 '-methyl dCTP during 
first strand synthesis (Short, J.M. 1988, Nuc Acids Res 
16:7583-7600) . 

In an alternative embodiment of the invention, when DNA 
inserts are derived from mRNA, directional cloning may be 
applied to improve the efficiency of cloning. The cDNA 

15 inserts can be unidirectional ly ligated in the sense 
orientation with respect to the promoter and terminator 
fragments. This can be achieved by generating different, 
non-ligatible ends on both promoter and terminator fragments. 
Bgl II, xho I, xma I, and BaraHI are used to illustrate the 
invention. Any pair of enzymes that generate compatible ends 

20 and can be protected by methylation can be used. 

An Xmal site is substituted for the Xho I site at the 5' 
ends of the terminator fragments, while the preparation of 
the promoter fragments is unchanged. Xma I is used because 
it is compatible with Not I by filling in with Klenow 
fragment and dCTP. This results in a terminator fragment 

25 that has a two-base dCTP-dCTP 5» overhang, which is 

compatible with suitably prepared Not I digested cDNA gene 
fragments. See Figure 5A. 

5.5.3. PREPARATION OF DNA INSERTS 
Coding gene fragments for the eukaryotic libraries will 
30 be derived from two principal DNA sources, namely that of 
genomic DNA (gDNA) or complementary DNA derived enzymatically 
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from messenger RNA (cDNA) . Strategies for preparation of 
gDNA or cDNA are very similar, but not identical. 

Complementary DNA is made from messenger RNA and /or 
total RNA using standard protocols available in the 
literature, or particular to a manufacturer's instructions. 
Isolation of total RNA may be accomplished simultaneously 
with genomic DNA by the guanidium-isothiocyanate method 
described in Section 5.3.1, and mRNA can be isolated by 
subsequent affinity chromatography over oligo-dT cellulose. 

First strand cDNA synthesis can use an oligo-dT DNA 

10 

primer that contains a cloning site, e.g., a Not I site, at 
the 5' end. An oligonucleotide of random sequence, which 
contains an internal Not I site near its 5 1 end, can also be 
used for randomly-primed first strand synthesis. The use of 
this alternative primer avoids 3' bias for large mRNAs. 

15 Methylated deoxynucleotide, such as 5-methyl-dCTP may be used 
with a polymerase such as Pfu to provide protection from 
restriction digestion (Short et al., supra; G.L. Costa, 1994, 
Strategies 7:8). Only non-methylated sites present in the 
initial primers will be available for cleavage, thus ensuring 
a defined 3' end for the cDNAs. Methylated cDNA can also be 

20 P roduced b Y treatment with methylation, but the 

directionality of the cloning will be lost because all 
available sites will be methylated, and thus resistant to 
enzymatic cleavage. 

Defined 5 1 ends of cDNA may be prepared by ligation of 
sequence-specific adapters, such as a modified BamHI adapter 

25 which has a 5' phosphate. When annealed to its partner 

oligonucleotide, the adapter contains only a two-base dGTP- 
dATP 5 f overhang and a blunt 5 1 phosphate end. This modified 
adapter can be ligated to cDNA that has been treated with Pfu 
or T4 DNA polymerase as in standard protocols. After 
ligation of modified BamHI adapters and digestion of the cDNA 

30 with Not I , the adapted cDNA can be treated with Klenow 
fragment and dGTP generating a defined, direct ionally 
oriented cDNA gene insert ready for ligation to suitably 
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prepared promoter and terminator fragments. The orientation 
of the fragments is such that the 5 f end of the cDNA is 
located toward the 3 1 end of the promoter, and the 3 1 end of 
the cDNA is located toward the 5' end of the terminator 
fragment. See Figure 5C. 

Genomic DNA fragments are obtained by partial digestion 
of total genomic DNA with a frequently cutting restriction 
enzyme, such as Sau 3AI. This enzyme is widely used for this 
purpose, and partial digestion followed by sizing though 
sucrose gradients is a very standard technique. Fragment 
pools from three different digestions that vary in the 
concentration of initial enzyme can be used to allow for 
differences in enzyme sensitivity within the genomes. 

Following size fractionation and purification, the 
fragments can be treated with BamEI methylase to protect any 
internal BamHI sites, followed by treatment with Klenow 
fragment and dATP & dGTP. This results in gene fragments 
that are internally methylated at BamHI sites, and possess 
only dATP-dGTP overhangs. See figure 5D. These fragments are 
incapable of self-ligation, and are only capable of ligating 
to suitably prepared promoter and terminator gene fragments. 

20 

5.5.4. LIGATION OF INSERT DNA TO PROMOTERS AND 
TERMINATORS 

Suitably prepared cDNA, promoter, and terminator 

fragments can be ligated at 16°C over night. A ratio of 10 

promoter (P) : l cDNA: 10 terminator (T) may be used in the 

25 ligation reaction. The optimal ratio may be determined 

empirically by techniques known in the art. The directional 

cloning procedure provides only one ligation product, i.e., a 

correctly oriented promoter-sense insert-terminator gene 
cassette . 

Ligation of prepared genomic DNA, promoter, and 
30 terminator gene fragments may be carried out at 16 °C with 
varying ratios. Since none of the ligation components can 
self-ligate, the optimal ratios may be determined 
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empirically, it is estimated that half of the ligation 
products formed are directly useable, 1/4 of the products 
formed cannot enter the rounds of ligations, and 1/4 of the 
products can be ligated only once before terminating the 
growing chain. 

The following combinations (p=promoter, frag=5«-3' 
genomic DNA fragment, T=terminator garf=3«-5« genomic DNA 
fragment) : 

1. P-frag-T 5. P-garf-T 

io 2. T-frag-P 6. T-garf-P 

3. P-frag-P 7. P-garf-P 

4. T-frag-T 8. T-garf-T 

Combinations 1,6 & 2,5 represent the desired constructs, 
but because the orientations of the inserts are random, it is 
expected that 50% of these constructs will be in the correct 
^ orientation for any given gene (l and 6) . 

Terminator/terminator gene cassettes may form, but 
cannot be involved in any subsequent cloning step because of 
the lack of an exposed 5' end because of the blunted, uncut 
BamHI end at their 3' termini. 

Promoter/promoter constructs will clone in subsequent 
20 Rations only to other exposed BamHI ends, because the Bgl 
II end lacks a 5- phosphate (first round). Subsequent 
ligations to the exposed Bgl II end should be rare with 
incoming gene cassettes because of the lack of 5' phosphates. 
Exposed BamHI ends will only be made possible on resident 
forming chains and not on incoming new gene cassettes. Thus 
25 it is expected that such promoter /promoter gene cassettes 
will terminate a chain by circularization with a nearby BamHI 
site on another chain, such circular izat ions are non- 
recoverable. If such promoter/promoter fragments become a 
significant problem to ligation efficiencies then an 
intermediate kinase treatment of the fixed growing chains 
30 prior to addition of new gene cassettes should allow the 
promoter/promoter fragments to extend the growing chains by 
forming Bgl Il/Bgl ji ligation products. The kinase 

- 118 - 



WO 00/52180 



PCT/US00/05707 



treatment will promote Bgl II/Bgl II and Bgl II /BamHI 
ligations on the solid phase, which will circularize the 
growing chains involved. 

5 

5.5.5. SERIAL LIGATIONS OF GENE CASSETTES TO FORM 
CONCATEMERS 

Ligation of the gene cassettes, each consisting of 

either genomic DNA or cDNA insert flanked by 

promoters /terminator combination will be performed in a 

method analogous to that outlined previously for prokaryotic 

10 DNAs. The major difference here is that this strategy used 
the endonuclease BamHI to create exposed 3 » restriction sites 
for subsequent cloning. The use of either BamHI methylase or 
5-methyl-dCTP insures that BamHI sites within the insert DNA 
will be protected. See Figure 5E. 

15 After 5-10 rounds of chain ligation, the growing chains 

of concatemers will be deprotected with BajnHI and prepared 
for ligation to the expression vector by treatment with the 
Klenow fragment and dATP and dGTP. This will render all ends 
of the growing chain incapable of ligating to each other, 
thus eliminating any circularization and loss of concatemer 

20 chains. 

Vector DNA can be ligated to concatemer chains in a 5:1 
molar ratio, other ratios may also be used. The can be done 
at 16°C for 8-12 hours, or at 22°C for four hours. Following 
ligation the beads can be washed and resuspended in intron 
nuclease restriction buffer. Digestion will be carried out 
25 as described by the manufacturer's instructions. Any intron 
nuclease may be used. The enzyme Ceul is preferred for it 
produces non-palindromic 3' overhangs, which are useful in 
preventing self -ligations. See Figure 5F-5G. 
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5.5.6. CIRCULARIZATION AND TRANSFORMATION OF 

VECTOR CONTAINING Cnn catemer CONSTRUCTS 

Concatemer-vector molecules released from the solid 
phase can be encouraged to undergo intra-molecular ligation 
5 by dilution of the Ceul digestion mix 100-fold with IX ligase 
buffer. T4 ligase can be added, and the reactions may be 
carried out at 22°C for 4-6 hours, or 16°C overnight. See 
Figure 5F-5G. The resulting constructs may be concentrated 
by microfiltration or freeze-drying, and introduced into 

either S. pombe strains, or alternatively into E. coli or S 
10 , 

lividans strains by standard methods. Any method may be 
used, including but not limited to electroporation, and 
modified calcium-phosphate transformation methods. 

5.5.7. PREPARATION AND LIGATION OF PREPARED VECTOR 
FOR EXPRESSION IN VKAfiT 

This section describes procedures that may be generally 
applied to prepare combinatorial gene expression libraries 
using yeast as the host organism. 

For preparing a library in S. pombe, one possible 
vector, but certainly not the only vector, is the £. coli/s. 
20 pombe shuttle vector pDblet (Brun et al. 1995, Gene, 164:173- 
177) . This vector has the advantage of having multiple 
cloning sites and fi phage origins, being expressed at 
moderately high copy number and being very stable in both E. 
coli and S. pombe. 

For the present invention, the multiple cloning site 
(MCS) of pDblet may be modified to accommodate a BstXI site 
of known sequence. See Figure 6B. This is because the 
intron nuclease enzyme that is used to release the concatemer 
chain from the solid phase generates 3' nucleotide overhangs 
of a defined sequence (3'GATT...). An engineered BstXI site 
30 having the sequence CCACCTAACTGG generates the appropriate 
CTAA-3 1 overhang after cleavage. 
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To modify pDblet, it can be first cut with Sad & NotI 
to remove the existing BstXI site which does not have the 
correct sequence. The pDblet plasmid, once purified by spin- 
5 chromatography or other means, can be mixed with a 

presynthesized oligonucleotide which contains in addition to 
a correct sequence for the BstXI site, a new Ncol site and 
Sacl- and NotI- compatible overhangs. See Figure 6C. After 
ligation and transformation, mini-preps of clones are checked 
io COrrectness by di gestion with Ncol. Correct clones will 
be identified by the presence of both a BstXI and Ncol site. 
Treatment of this modified pDblet, with BstXI followed by 
Xhol sites generates a vector that contains a 5' Xhol site 
and a 3' CTAA BstXI overhang. See Figure 5E. This cleaved 

„ VeCt ° r ° an be treated w ith Klenow fragment and dCTP and dTTP 
15 to render it incapable of ligating to itself, such a vector 
may be used to accept the concatemer chains. 

In one embodiment, the invention encompasses cosmid 
vectors that contain an autonomously replicating sequence of 
S. pombe, and thus can be used to prepare combinatorial gene 
2o expression library in 5. Pombe. A series of cosmid vectors 
can be constructed which comprises as least one cloning site 
for insertion of donor DNA, cos sites for in vitro packaging 
in A phage, replication origin(s) and selection markers for 
cloning in E. coli, an autonomously replicating sequence 
(ARS) of s. pombe, and one or more different yeast selection 
25 markers, such as but not limited to puromycin, ura4, 
hygromycin or zeocin. 

cosmid vector SuperCosl (Stratagene) was linearized with 
restriction endonuclease Bgrlll. The plasmid, purified by the 
geneclean procedure (BiolOl) , was treated with T4 DNA 
^ polymerase to -fill-in- the DNA termini. The DNA was again 
purified by the geneclean procedure. The DNA was treated 
with T4 DNA ligase and transformed into S. coli strain DH5a. 
Clones were tested for their ability to be cut with 
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restriction enzyme Bglll. One clone that was resistant to 
Bglll was isolated for further work was called SCos-Bglll. 

SCos-Bglll was digested/ linearized with restriction 
endonuclease BamHI. The linearized vector was treated with 
calf-intestine alkaline phosphatase to prevent self-ligation 
and DNA was purified by the geneclean procedure* An 
artificial DNA linker containing two Dral, one Xhol, and one 
Bgrlll restriction endonuclease sites was ligated into the 
BamHI site with T4 DNA ligase. The ligation was transformed 
10 into E. coli strain DH5a. The resulting plasmid (pSuperCosB) 
no longer contains the BamHI restriction endonuclease 
cleavage site. 

pSuperCosB was linearized with restriction endonuclease 
Aatll. The linearized vector was treated with calf -intestine 

15 alkaline phosphatase to prevent self-ligation, and was 

purified by the geneclean procedure. Concurrently, plasmid 
pDblet was cut with restriction endonuclease Aatll. The 
digested pDblet plasmid was separated by agarose gel 
electrophoresis. A 1198 bp fragment, containing the yeast 
autonomously replicating sequence (ARS) was cut from the gel 

20 and purified by the geneclean procedure. The ARS-containing 
DNA fragment was then ligated in the linearized SuperCosl 
vector. The ligation was transformed into E. coli strain 
SC110. The resulting clone was called pPCos. 

pPCos was digested with restriction endonucleases Bglll 
and Xhol. pDblet was digested with restriction endonucleases 

25 

BamHI and Xhol. Both digests were separated by agarose gel 
electrophoresis. The ura4 gene containing BamHI/ Xhol 
fragment and the pPCos vector fragment were cut from the gel 
and purified by the geneclean procedure. The two DNA 
fragments were ligated together using T4 DNA ligase. The 
30 ligation reaction was then transformed into E. coli strain 
DH5or. The resulting plasmid clone was called pPCos+ura. 
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The vector pPCos+ura is deposited at the Agricultural 
Research Service Culture Collection (NRRL) , Agricultural 
Research Service, U.S. Department of Agriculture, 1815 North 
^ University Street, Peoria, Illinois 61604, U.S.A. on October 
24, 1996, and is given accession number B-21637N. 

Alternatively, pPCos can be digested with restriction 
endonucleases Smal and Belli. This releases a DNA fragment 
containing a truncated neomycin resistance gene. The vector 
is purified from this fragment by agarose gel electrophoresis 
and the geneclean procedure. An artificial DNA linker 
containing Spel, Kpnl, and Ndel restriction endonuclease 
sites is ligated into the vector using T4 DNA ligase. The 
ligation is transformed into E. coli strain DH5a. The 
resulting vector is called pPCos-Neo". 

Moreover, pPCos-Neo' can be linearized with restriction 
15 endonuclease Ndel. The linearized vector is treated with 
calf-intestine alkaline phosphatase to prevent self-ligation 
and DNA was purified by the geneclean procedure. 
Concurrently, plasmid pDblet is digested with restriction 
endonuclease Ndel which releases an approximately I800bp 
fragment containing the ura4 gene. This fragment is 
20 separated from the rest of the vector by agarose gel 

electrophoresis and purified by the geneclean procedure. The 
ura4 gene fragment is ligated into the pPCos-Neo" vector 
backbone using T4 DNA ligase. The ligation is transformed in 
E. coli strain DH5a. The resulting plasmid is pPCosl (see 
Figure 14) . 

25 

5-5.8. PLANT EXPRESSION LIBRARIES 

This section describes procedures that may be generally 
applied to prepare combinatorial gene expression libraries 
using plant cells as donor and/or host organisms. 



30 
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For preparation of donor DNA from plants, the following 
general procedure is applied: (1) a pretreatment of the plant 
tissue in cold ether to enhance cell disruption; (2) 
mechanical homogenization of the tissue by grinding with 
sand, glass beads or aluminum oxide; (3) filtration through a 
mesh to remove cell debris; and (4) extraction of the DNA by 
the procedures described in 5.1.2. The resulting purified 
DNA is modified as described in Sections 5.5.3. The CaMV 35s 
or nopaline synthase promoter, and nopaline synthase 
10 terminator fragments are prepared by PCR as described in 
Section 5.5.3. The promoter and terminator fragments are 
attached to the DNA fragments, and ligated to a plant DNA 
vector as described in 5.5.5 and 5.5.6. 

A preferred plant DNA vector is Binl9 or its variants 
which uses T-DNA borders and trans acting functions of the 
15 vir region of a co-resident Ti plasmid in Agrobacterium 

tumefaciens to transfer the donor genetic material into the 
nuclear genome of plant host cells (Bevan 1984, supra). 
Modified Binl9 vectors containing a multiple cloning site, 
such as pBI121 or pBI221 which are commercially available 
(Clontech, Palo Alto) , can be used. Kanamycin resistance 
20 and/or 3-glucuronidase activity are used as markers for 
monitoring transformation, and for pre-screening. 

Plant protoplasts are prepared from leaves of Nicotiana 
tabacum plants as described in Potrykus et al. 1988 in 
"Methods for Plant Molecular Biology" Weissbach and Weissbach 
2s ed. Academic Press, page 376-378. The expression constructs 
are introduced into protoplast cells by transformation using 
polyethylene glycol as described in Power et al. 1988 in 
"Methods for Plant Molecular Biology" Weissbach and Weissbach 
ed. Academic Press, page 388-391. The transformed 
protoplasts are selected by antibiotic resistance, e.g., 
30 kanamycin, and can be encapsulated for pre-screening as 
described in Section 5.4.10. 
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6. EXAMPLE: CONSTRUCTION AND SCREENING OF 
COMBINATORIAL GENE EXPRESSION 
LIBRARY 

5 The following subsections describe the preparation and 

pre-screening of combinatorial gene expression libraries 
using mixtures of terrestrial microorganisms or marine 
microorganisms as donor organisms. The libraries utilize 
Streptomyces lividans, E. coli and s. pombe as host 
organisms. The results show that some of the library cells 
10 display metabolic activity of the donor organisms indicating 
that potentially interesting donor metabolic pathways are 
functional in the host organisms. In addition, it is shown 
that one library clone contains DNA encoding a marine 
bacterial protein that shares sequence homology to a known 
enzyme in a metabolic pathway. 

15 

6.1. MATERIALS AND METHODS 
Reagents useful in the present method are generally 
commercially available. For example: 

Gene Clean, Genome kit (BiolOl, vista, CA) ; Restriction 
enzymes, PCR reagents, and buffers (Promega, Madison, WI; New 
20 England Biolabs; Stratagene, La Jolla, CA) ; TA cloning kit 
(Invitrogen, La Jolla, CA) ; Bacterial media (Difco, Inc.); 
Mira Tip (Hawaiian Marine Imports, Inc.); pBSK plasmid, XL1- 
MR cells, SuperCos 1 cosmid, Gigapack packaging extracts 
(Stratagene, La Jolla, CA) ; Qiagen QIAprep plasmid 
purification kit (Qiagen, Inc., Chatworth, CA) ; avidin- 
25 conjugated magnetic porous glass (MPG) beads (CPG, Inc., New 
Jersey); petri dishes, 96- and 384-well plates, Omni-Trays 
(Nunc) , 96- and 384-pin replicator and forms (V & P 
Scientific, San Diego, CA) ; ampicillin (IBI, Inc., CA) ; green 
fluorescent protein and GFP cDNA (Clontech, Inc.); 
oligonucleotides (Genset, La Jolla, CA) ; bacterial species 
30 and DNA seguences not elsewhere designated (American Type 
Culture Collection, Rockville, MD) ; 7-ethoxy-heptadecyl- 
coumarin, BCECF-AM (Molecular Probes, Oregon); 3 -methyl 
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benzoate, 3-chlorotoluene, m-toluate, tetracycline, 
chloramphenicol, acetaminophen, arsenic, antimony, cis-cis- 
muconate, and other chemicals unless noted (Sigma) ; and 
Dynabeads, MPC-M (Dynal, Inc., Lake Success, NY). 

6.1.1. MEDIA PREPARATION 
Purified water (ddH 2 0) for general use in media and 
solutions is purified by softening, reverse osmosis, and 
deionization. Pacific seawater (sea H 2 0) is obtained from 
Scripps Institute of Oceanography (La Jolla, CA) and filtered 
before use. Synthetic seawater (SSW) is prepared from ddH 2 0 
by the addition of salts (45.2mm NaF, 48.8mm SrCl 2 , 0.324mM 
H3BO3. 0.563mM KBr, 6.25mM KC1, 4.99mM CaCl 2 , .7mM Na 2 S0 4 , 
16.4mM MgCl 2 , 268mM NaCl, 45.8mM Na 2 Si0 3 , l.lOmM EDTA, 1.58mM 
NaHC0 3 ) and marine trace elements (0.01 % Mira Tip). 

LB medium is prepared from ddH 2 0 with 1% tryptone, 0.5% 
yeast extract, 1% NaCl. W2-B1 is prepared from 75% sea H 2 0 or 
SSW with 0.25% peptone, 0.15% yeast extract, 0.6% (vol /vol) 
glycerol . 

F10A is prepared from ddH 2 0 containing 2.5% soluble 
potato starch, 0.2% glucose, 0.5% yeast extract, 0.5% 
peptone, 0.5% Distiller^ solubles (Nutrition Products Co., 
Louisville, KY) , 0.3% calcium carbonate with pH adjusted 
to 7. 



6.2. PRE-SCREENING OF ACTINOMYCETES / STREPTOMYCES 
LIVIDANS COMBINATORIAL NATURAL PATHWAY 
EXPRESSION LIBRARY BY PLATE REPLICATION AND 
25 MACRODROPLET ENCAPSULATION 

Thirty four actinomycetes species, identified as species 
# 501-534 were used as donor organisms. The organisms were 
cultured in F10A medium separately, and genomic DNA was 
extracted and purified as described in Section 5.3.1. 

Approximately 100 ng genomic DNA per species was 
30 obtained and mixed together for partial restriction digestion 
by Sau3h as described in Section 5.4.2. Fragments of genomic 
DNA were subjected to size fractionation by sucrose gradient 
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centr if ligation, and fractions containing 20-40 kb fragments 
were pooled and partially filled-in with the Klenow fragment 
so as to be compatible with similarly-prepared vectors below 
(Korch 1987 f Nuc Acids Res 15:3199-3220; Loftus et al. 1992 
Biotechniques 12:172-175). 0.5-3.0 fig of the pooled 
fragments were ligated in multiple batches to 0.5-3.0 fxg of 
pIJ922 and pIJ903 (Hopwood 1985, supra) vector prepared with 
BamHl or Xhol. The ligated expression constructs were 
transformed into the host organism, Streptomyces lividans, 

10 strain TK64 which had been made competent by removal of cell 
walls with lysozyme (Hopwood 1985, supra). Approximately 
11,000 unique clones were generated, amplified and stored as 
mycelia in 20% glycerol and as spore suspensions in 50% 
glycerol at -70 °C. 

To prepare the libraries for screening individual 

15 clones, the transformed TK64 host cells were spread on 150mm 
Petri dishes filled with F10A agar. After spreading, the 
plates were allowed to incubate for 21 hours at 30°C. A 
selection was performed by overlaying plates with 
thiostrepton at 5 }xg/ml, 1 ml/plate. After 48-72 hours, 
colonies were picked with sterile toothpicks and transferred 

2 0 one per well to 96-well plates. Each well ' contained F10A 
media. These inoculated master plates were placed at 30°C 
for 1-4 days. The overnight master 96-well plates were used 
as source plates to replicate into one or more working 96- 
well plates or Omni-Trays. The master 96-well plates were 
then sealed individually and frozen at -80°C Replication 

25 was done with a 96-pin replicator which was sterilized by 
flaming before each use. 

Working 96-well plates were used as source plates to 
replicate the library onto a series of differential and/or 
selective media and indicator plates. Selective antibiotics 
included erythromycin, novobiocin and neomycin. Differential 

30 media included F10A and R5 medium containing substrates X- 
glucopyranoside and X-gluconic acid. Indicator plates 
included library clones grown on F10A then overlaid with a 
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indicator lawn of Enterococus faecalis (E. faecalis) , 

Bacillus subtilis (B. subtilis) or SOS Chromotest (with X- 

gal) . The results are compiled and compared to the profiles 
5 of Streptomyces host TK64. 

The clones of the library are also pre-screened by 
macrodroplet encapsulation. For each pre-screen, 50,000 
amplified clones of the library are encapsulated by the 
method as described in Section 5. 4 .13. 

10 6.3. PRE-SCREENING OF ACTINOMYCETES / E . COLI 

COMBINATORIAL CHIMERIC PATHWAY EXPRESSION 
LIBRARY BY MACRODROPLET ENCAPSULATION 

Genomic DNA obtained from the thirty four actinomycetes 
species (identified as species # 501-534) as described in 
Section 6.2, were used in the preparation of a combinatorial 

15 chimeric pathway gene expression library in a S. lividans 
host. Fractions containing fragments of Sau3A-digested 
genomic DNA of 2-7 kb were pooled. 

Aliquot s of the genomic DNA fragments are ligated to the 
different promoters separately to form gene cassettes as 
described in Section 5.5.3. The concatemers are formed by 8 

20 cycles of ligation and deprotection using a different pool of 
gene cassettes for each cycle, such that the resultant 
concatemers each have 8 gene cassettes comprising 8 different 
promoters attached to fragments of genomic DNA. 

Ten micrograms of the concatemers were circularized and 
ligated to 0.5 /ig of SuperCos 1 vectors at the BamHl site to 

25 form expression constructs, which were packaged in vitro for 
infection of the E. coli host cells XL1-MR according to the 
manufacturer's directions (Stratagene) . Approximately 
1,000,000 of unique clones are obtained, amplified and pooled 
to form an amplified library. The library was stored at - 

70°C. Amplified cells are encapsulated as in Section 5.4.10, 

and pre-screened as in 5.4.14. 
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6.4. PRE-SCREENING OF FimGKL/SCHIZOSACCHAROMYCES 

POMBE COMBINATORIAL CHIMERIC PATHWAY 

EXPRESSION LIBRARIES BY MACRODROPLET 
ENCAPSULATION 

5 Two combinatorial chimeric pathway expression libraries 

were prepared using the following fungal donor organisms 
obtained from ATCC: Trichoderma reesei, Fusarium oxysporum, 
Penicillium roquefortii, Rhizopus oligospoms , Neurospora 
crassa, Phycomyces blankesleeanus, Aspergillus fumigatus, 
Aspergillus flavus, Emericella heterothallica, Chaetomium 
gracile, Penicillium notatum, Penicillium chrysogenum. 

Each species was cultured separately in 500 ml potato 
dextrose agar (PDA; Difco) or malt extract agar (MEA; Difco) 
at medium rpm for 48-72 hours. Spore inoculations of lxlO 4 - 
lxlO 6 spores per ml were placed into 500 ml of potato extract 

x5 or malt extract broths in 1 liter culture flasks and grown at 
22 C, 225 rpm, 48-72 hours. 

Cultures were harvested by filtration through Miracloth 
(Calbiochem) under vacuum. The collected mycelial masses were 
washed with 2 litres of ddH 2 0 f and air-dried for 10 minutes 
before freeze drying. Fungal genomic DNA and mRNA were 

20 extracted and purified from the mycelia as described in 

Sections 5.3.1 and 5.3.2. A portion of the harvested mycelia 
were freeze-dried and stored at -70°C. 

Fungal genomic DNA fragments were prepared as described 
in Sections 5.4.2. Fungal mRNA was converted into cDNA 
according to standard methods. (Sambrook et al. 1989, Watson 

25 CJ & Jackson JF (1985) DNA cloning: A practical approach 79- 
88, IRL Press). Weight equivalents of DNA fragments from 
each species were pooled to yield a genomic DNA pool and a 
cDNA pool. 

Each of these pools containing approximately 5-10 iiq of 
DNA is used independently to assemble a combinatorial 

30 

chimeric pathway expression library. The following S. 
pombe-compatible promoters and terminators were generated as 
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described in Section 5.5,2: CMV immediate/early , SV40 
early, RSV, HSV thymidine kinase, CaMV, nmtl, adhl and uva4 
promoters. The promoter and terminator fragments are 
combined with the cDNA and genomic DNA pools as described in 
Sections 5.5.4. Each gene cassette averaging 5 kb in length 
is concatenated as described in Section 5.5.5. The final 
concatemers containing 8 gene cassettes each are circularized 
and inserted into the vector modified pDblet (Brun et al. 
1995, Gene, Vol. 164 pp. 173-177) as described in Section 
5.5.7. The expression constructs were transformed into S. 

10 

pombe cells via lithium acetate method of Gietz and Woody (FD 
Gietz & RA Woody, Molecular genetics of yeast: A practical 
approach, chapter 8, pp 121-134). Upon selection for 
presence of the ura4 marker, 110,000 S. pombe clones are 
obtained and amplified. The clones are pooled to form an 
15 amplified library ready for pre-screening . The following 
pre-screens are performed: enzyme substrate test, anti- 
microbial activity, antibiotic resistance. 

6.5. PRE-SCREENING OF MARINE GRAM(-) /E. COLI 
LIBRARY BY PLATE REPLICATION 

2o Marine bacteria obtained from seawater collected near 

the Bahamas Islands were provided by the Harbor Branch 
Oceanographic Institute. Each of the wild-type gram-negative 
pigmented marine bacterial species was tested prior to 
preparation of the DNA libraries to determine redundancy, and 
to help determine the array of pre-screens to be done on the 

25 completed libraries. 

The following assays were performed on the parental 
species of marine gram-negative/E. coli library, with the 
indicated results: 
Table V: 



30 
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MsaY Positive species out of 37 specie 

Chromazurol S (CAS) 27 

Streptococcus pyogenes 0 
Enterococcus faecalis 

3 

Proteus mirabils 1 

Sarcina aurantiaca 10 

Staphylococcus aureus 6 

Starch digestion 17 



Of these assays, the following were selected to be 
10 performed on the cells of the combinatorial gene expression 
library in E. coli : CAS; S. aureus; S. aurantiaca; starch 
digestion. 

Briefly, each of the 40 parental species was inoculated 
into 5 ml of B3 medium and cultured overnight at 30°C, 300 
rpm in Falcon 2059 tubes, the overnight cultures were 

15 pelleted and the total genomic DNA extracted by standard 
procedures. Genomic DNA was quantified by visualization on 
an agarose gel and 5 Mg DNA from each of the 4 0 species was 
contributed to a pool totaling 200 jig. The combinatorial 
natural pathway expression libraries were assembled in E. 
coli as described in Section 5.1.4. This DNA was partially 

20 digested, ligated to SuperCosl and packaged in A phage for 
introduction into E. coli according to the SuperCosl 
manufacturer's directions (Stratagene) . This resulted in 5 x 
10 6 unique clones, which was amplified to 7 x 10 8 /hlL cfu by 
standard protocols. The amplified stock was stored in 15% 
glycerol at -70°C for subsequent use. 

25 

To prepare the libraries for screening individual 
clones, the amplified library cells were spread on 150mm 
Petri dishes with 50ml LB, lOOmg/ml ampicillin and 50mg/ml 
kanamycin. The plates were previously dried for 24 hours at 
ambient temperature in the dark. The 7 X 10 8 /ml cfu stock was 
diluted in LB to 500 cfu/ml. One ml was spread on each 150mm 
30 plate. After spreading, the plates were allowed to incubate 
overnight at 37 °C. Resulting colonies were picked with 
sterile toothpicks and transferred one per well to 384-well 
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plates. 6400 colonies were picked and archived. Each well 
contained 75 /il LB, 50 txg/ml ampicillin, 7% glycerol. The 
outer rows (80 wells total) were not inoculated but were 
similarly filled with medium to provide an evaporation 
barrier during subsequent incubation and freezing. These 
inoculated master plates were placed at 37°C for 16 hours 
without shaking. The overnight master 384-well plates are 
used as a source plate to replicate into one or more working 
multi-well plates or Omni-Trays. The master 384-well plates 
were then sealed individually and frozen at -80°C. 

10 

Replication was done with a multi-pin replicator. Before and 
after each use, the 384-pin replicator was dipped 
sequentially into bleach for 20 seconds, water for 30 
seconds, then ethanol for 5 seconds before flaming. 

Working multi-well plates or Omni-Trays were used as 
source plates to replicate the DNA libraries onto a series of 
differential and/or selective media (e.g. siderophore 
detection media (CAS) or antimicrobial lawns) . The results 
were compiled and compared to the profiles of the wild-type 
marine bacteria used to construct the DNA library. 

Six clones were isolated that were positive for starch 
digestion ability. These clones were tested for the ability 

20 

to inhibit growth of S. aureus or S. aurantiaca, and one 
clone was found to inhibit the growth of S. aurantiaca. This 
clone was subjected to further analysis, including DNA 
sequence analysis, and was found to contain DNA sequences 
encoding proteins homologous to those in a polyketide 
25 synthesis pathway. Figure 10 shows the alignment of the 

predicted amino acid sequence of a DNA sequence derived from 
clone CXOAMN20 with the actinorhodin dehydrase gene of 
Streptomyces coelicolor. 

The active component from this clone is further analyzed 
by extraction with organic solvents and purification guided 
30 by anti-microbial assays. 

The DNA sequence contained in this clone was further 
examined by multiplex PCR to determine the cognate parental 
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species. PCR primers were selected and synthesized based on 

sequence of the clone. Highly conserved ribosomal RNA primer 

sequences were used in the PCR as positive control. The 

positive control generates a fragment of approximately 2 kb. 

The amplicon generated from the clone or its cognate parental 

species was less than 600 bp. Initially, mulitplex PCR 

reactions were performed by standard method using a set of 

four pools of genomic DNA of the parental species. Genomic 

DNA from Pool 1-3 produced the amplicon upon amplification. 

See Figure 11. The multiplex PCR reactions were repeated 

with genomic DNA of individual parental species. Figure 12 

shows that genomic DNA derived from species #6 from Pool 1, 

species #18 from Pool 2 and species #31 from Pool 3 were 

positive in the PCR reaction. This suggested that the 

identified DNA sequence was likely derived from any of these 

3 species of marine bacteria. 

Thus, the results show that the combinatorial gene 

expression library contains clones carrying genetic material 

derived from marine bacteria that encodes metabolic pathway 

of interest. Furthermore, it is shown that such clones in 

the library can be identified, and isolated by pre-screening. 

6.6. PRE-SCREENING OF MARINE GRAM (-)/£. COLI 
20 LIBRARY BY MACRODROPLET ENCAPSULATION 

30,000 clones were encapsulated by taking sodium 

alginate (Protanol LF 20/60, Pronova Biopolymer, Drammer, 

Norway) and dissolving it in 100 mL of sterile water at a 

concentration of 1% using an overhead mixer at 2000 rpm. 

One ml of library suspension containing 30,000 cells was 

25 added so as to embed 1-5 clones per droplet. The mixture was 
allowed to sit for 3 0 minutes to degas. The mixture was then 
extruded through a 25 gauge needle. These fluids were 
dropped into an 0.5L gently stirring beaker of l35mM calcium 
chloride. Droplets were allowed to harden for 10 minutes and 
then were transferred to a sterile flask and the calcium 

30 chloride removed and replaced with LB/Amp media and a 

substrate, X-glucosaminide, at 80 /ig/ml. Other substrates 
were X-acetate, X-glucopyranoside, X-gal and specific custom 
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substrates relevant to polyketide pathways. Flasks 
containing the droplets were then shaken at 30°C overnight 
and examined the following morning for positive clones 
indicated by the presence of blue colonies. Clones are also 
co-encapsulated with indicator cells as described in 5.4.14. 
Indicator cells include S. aureus, £. aurantiaca. 

Droplets were placed in a single layer in a large clear 
tray and scanned by eye. One X-glucosaminide positive was 
recovered, resuspended in 15% glycerol and stored at -70°C. 
Other positive colonies are removed and placed in 96-well 
master plates containing LB/ Amp and 50 mM sodium citrate pH 
7.4 to dissolve the matrix, and allowed to grow at 37 °C 
overnight. These overnight master 96-well plates are used as 
a source plate to replicate into one or more working 
multi-well plates or Omni-Trays. The master 96-well plates 
are then sealed individually and frozen at -80°C. Positive 
clones are either sent for specific testing of the products 
or sent through another round of pre-screening or screening. 
Further screening is performed by replication which is done 
with a multi-pin replicator. 

CONSTRUCTION AND SCREENING OF 
ACTINOMYCETES/ STREPTOMYCES LIVIDANS 
COMBINATORIAL GENE EXPRESSION 
LIBRARY 

The following subsections describe the preparation of a 

combinatorial natural pathway expression library and a biased 

combinatorial gene expression library using a mixture of 

25 terrestrial microorganisms as donor organisms. Both 
libraries utilized E. coli as an archival host, and 
Streptomyces lividans as the expression host. 

Briefly, to make the combinatorial natural pathway 
expression library, an archival cosmid library was made with 
the DNA of donor organisms. The inserts of the library was 

30 then isolated and recloned into Streptomyces vectors for 
introduction into the host, Streptomyces lividans. To make 
the biased combinatorial gene expression library/ the 



7. EXAMPLE: 

20 
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archival library was pre-selected by hybridization with genes 
encoding type II polyketide biosynthetic pathways. Genetic 
material potentially encoding type II polyketide pathways 
from the actinomycete donor organisms were isolated, randomly 
mixed, and re-cloned into s. lividans to form the 
combinatorial expression library. 

The clones from both libraries were analysed for 
antibiotic activity that are potentially the products of 
novel or hybrid metabolic pathway that are functional in the 
host organisms. 

7.1. MATERIALS AND METHODS 

According to the invention, an archival library was 
constructed in the cosmid pWE15 and E. coli using the 

chromosomal DNA isolated from thirty-four donor actinomycete 
15 strains. 

To make the combinatorial natural pathway gene 
expression library, cosmid DNA was prepared from clones of 
the archival cosmid library, and were pooled. Due to the 
fact that actinomycete donor DNA has a high GC content, an 
enzyme Dral (that does not have G or c in its recognition 
20 site) was used to isolate the actinomycete donor DNA from the 
pooled cosmid DNA. DNA fragemnts greater than 25 kb in size 
were enriched by sucrose gradient centrifugation, and ligated 
to linkers compatible to the cloning sites of the expression 
vectors. The following Streptomycete expression vectors were 
used to form the library: pIJ94l, pU702, pIJ699, pIJ922 and 
25 plJ903 (Hopwood et al., 1985 Genetic Manipulation of 
Streptomyces, A Laboratory Manual, The John Innes 
Foundation) . The combinatorial natural pathway expression 
library was then introduced into the expression host s. 
lividans TK64. Eight thousand and two hundred (8,200) clones 
were picked, cultured separately, and analysed for the 
ability to inhibit growth of Micrococcus lutens, 
Staphylococcus aureus, Bacillus subtilis, Escherichia coli, 



30 
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Saccharomyces cerevisiae, Candida albicans, and Penicillium 
chrysogenum. The clones were also tested for DNA reactivity 
by the SOS test. 
5 To make the biased combinatorial gene expression 

library, the archival cosmid library was plated out on 150 mm 
petri dishes to a density of approximately 2 f 000 colonies per 
plate. A total of 60 , 000 colonies were screened. The plates 
were incubated for 18 hours at 30°C r in order to produce 
colonies that were approximately 0.5 mm in diameter. These 
10 colonies were then replicated to 137 mm, 0.45 /urn Nytran 
discs. The discs were then placed on sheets of 3 MM paper 
pre-soaked in 10 % SDS with the colony side up, and left to 
soak for 3 minutes. These discs were transferred to sheets 
of 3 MM paper, presoaked in 0.5 M NaOH, 1 M NaCl for 5 
minutes; then to 3 MM paper presoaked in 0.5 M Tris-HCl, 1.5 M 
15 NaCl for another 5 minutes; and finally to 3 MM paper 

presoaked in 2xSSC for 5 minutes. The discs were then left 
to dry in air for at least 6 hours. The DNA was then fixed 
to the filter by UV crosslinking. DNA probes specific to 
actl (Malpartida et al., 1984, Nature 309:462-464) and whiE 
(Davis et al. , 1989, Molecular Microbiology 4:1679-1691) 
20 Polyketide pathway were labelled using the non-radioactive 
DIG labelling kit and hybridized to the filter discs at 60°C 
in a solution of 5xSSC/0.l % SDS overnight. After the 
incubation, the discs were washed at 60°C in lxSSC/0.1% SDS 
for one to four times, and the colonies which had hybridized 
with the probes were detected. 
25 About 60-70 positive clones were isolated from the 

original plates, and pooled. Cosmid DNA was isolated from 
the pooled clones, and digested with Dral to separate the 
cosmid vector DNA from the actinomycete DNA. In order to 
randomly mix the genes in the metabolic pathways, the 
actinomycete DNA was partially digested with Sau3AI to 
30 generate fragemnts in the range of approximately 4-10 kb, and 
the fragments were ligated to form concatamers of greater 
than 50 kb. The ligated DNAs were redigested partially with 
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Sau3AI to generate fragments with an approximate size range 
of 15-30 kb, which were ligated into the Bglli site of the 
vector pIJ702 (Hopwood et al., 1985 Genetic Manipulation of 
Streptomyces, A Laboratory Manual, The John Innes 
Foundation) . The biased combinatorial expression library was 
then intorduced into the expression host s. lividans TK64. 
Two thousand and two hundred (2,200) clones were picked, 
cultured separately, and analysed for the ability to inhibit 
growth of Micrococcus lutens (MLUT) , Staphylococcus aureus 
10 (SAl), Bacillus subtilis (BS8), Escherichia coli (E. coli) , 
Saccharomyces cerevisiae (SC7) , Candida albicans (CA917) , and 
Penicillium chrysogenum (PC) . The clones were also tested 
for DNA reactivity by the SOS test (SOS) . 



15 



20 



25 



7.2. RESULTS 

For the combinatorial natural pathway expression 
library, 8,200 clones were screened for antimicrobial 
activities, and 205 clones (2.5%) of interests were 
identified. Table VI shows the antimicrobial assay results 
of 20 clones from the combinatorial natural pathway 
expression library. 

For the biased combinatorial gene expression library, 
2,200 clones were screened for antimicrobial activities, and 
71 clones (3.2%) of interests were identified. Table VII 
shows the antimicrobial assay results of nine clones from the 
combinatorial natural pathway expression library, which will 
be subjected to chemical structural analysis. 
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15 



The data show that both the combinatorial natural 
pathway expression library and the biased combinatorial gene 
expression library contain clones carrying genetic material 
^ that encodes metabolic pathways, possibly novel or hybrid 
polyketide pathways, that results in the production of 
compounds with antimicrobial activities. 

Having thus disclosed exemplary embodiments of the 
present invention, it should be noted by those skilled in the 
art that the disclosures are exemplary only and that various 
other alternatives, adaptations, and modifications may be 
made within the scope of the present invention. Accordingly, 
the present invention is not limited to the specific 
embodiments as illustrated herein. All references cited 
herein are incorporated by reference herein in their 
entireties for all purposes. 
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WHAT IS C LAIMED TS- 



1. A recombined combinatorial gene expression 

5 library, comprising a pool of expression constructs, each 
expression construct containing randomly at least one cDNA or 
genomic DNA fragments, wherein the cDNA or genomic DNA 
fragments in the pool of expression constructs are derived 
from a plurality of species of donor organisms and have been 
subjected to homologous or homeologous recombination, and 
10 wherein the cDNA or genomic DNA fragments in each expression 
construct are operably-associated with one or more regulatory 
regions that drives expression of genes encoded by the cDNA 
or genomic DNA fragment in an appropriate host organism. 

2. The gene expression library of claim l wherein 
15 some of the cDNA or genomic DNA fragments contained in the 

expression constructs are preselected for comprising DNA 
sequences that display sequence similarities to nucleotide 
sequences that encode proteins that form a part of a 
metabolic pathway of interest. 



20 3 - The ^ne expression library of claim 1 wherein 

the cDNA or genomic DNA fragments have been subjected to 
homologous or homeologous recombination in vitro in a 
reaction comprising the E. coli recA protein. 

4. The gene expression library of Claim 1, 2, or 3 

25 in which the expression construct comprises a plasmid vector, 
a phage, a viral vector, a cosmid vector, or an artificial 
chromosome . 



5. The gene expression library of Claim 4 in which 

the vector is a shuttle vector that replicates in different 
30 host cell species or strains. 
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6. The gene expression library of claim 4 in which 

the vector integrates into a host chromosome. 



7. 



5 



10 



15 



The gene expression library of Claim i, 2, or 3 
in which the cDNA or genomic DNA fragments are derived from 
an environmental sample, a mixture of terrestrial 
microorganisms, a mixture of freshwater microorganisms, or a 
mixture of marine microorganisms. 

8. The gene expression library of Claim 1, 2, or 3 
in which each expression construct is contained in a host 
cell. 

9. The gene expression library of Claim 8 in which 
the host cell is Escherichia coli, Bacillus subtilis, 
Streptomyces lividans, Streptomyces coelicolor, Pseudomonas 
aeruginosa, Myxococcus xanthus, Saccharomyces cerevisiae, 
Schizosaccharomyces pombe, Spodoptera frugiperda, Aspergillus 
nidulans, Arabidopsis thaliana, Nicotiana tabacum, COS cells, 
293 cells, VERO cells, NIH/3T3 cells, or CHO cells. 

20 10 * The 9ene expression library of Claim 8 in which 

the host cells further contain a reporter regimen tailored to 
identify clones in the library that are expressing desirable 
metabolic pathways or compounds. 

11. The gene expression library of Claim 8 in which 
25 the reporter regimen comprises DNA encoding a reporter gene 

operably-associated with a regulatory region that is 
inducible or modulated by the desirable metabolic pathways or 
compounds expressed by the host cell. 

12. The gene expression library of Claim 8 in which 
3o the host cells are in a matrix containing a reporter regimen 

tailored to identify clones in the library that are 
expressing desirable metabolic pathways or compounds. 
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13. A method for making a recombined combinatorial 
gene expression library, comprising ligating a DNA vector to 
cDNA or genomic DNA fragments to form expression constructs, 
wherein said cDNA or genomic DNA fragments obtained from a 
plurality of species of donor organisms have been subjected 
to homologous or homeologous recombination, and wherein the 
genes contained in the cDNA or genomic DNA fragments are 
operably-associated with their native or exogenous regulatory 
regions which drive expression of the genes in an appropriate 
host cell. 

14. The method of claim 13 wherein the cDNA or 
genomic DNA fragments are subjected to homologous or 
homeologous recombination in vitro comprising treating the 
cDNA or genomic DNA fragments to create single stranded 
regions, incubating the cDNA or genomic DNA fragments in the 
presence of the E. coli recA protein for a time sufficient 
for recombination to occur, and recovering the recombined 
cDNA or genomic DNA fragments. 

15. The method of claim 13 wherein some of the cDNA 
or genomic DNA fragments contained in the expression 
constructs are preselected for comprising DNA sequences that 
display sequence similarities to nucleotide sequences that 
encode proteins that form a part of a metabolic pathway of 
interest. 

16. a method for making a recombined combinatorial 
gene expression library, comprising: 

(a) treating cDNA or genomic DNA fragments obtained 
from a plurality of species of donor organisms 
to create single stranded regions; 

(b) incubating the cDNA or genomic DNA fragments 
with DNA fragments encoding proteins that form a 
part of a metabolic pathway of interest, in a 
reaction comprising a recA protein for a time 
sufficient to allow recombination to occur; 
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(c) recovering the recombined cDNA or genomic DNA 
fragments; and 

(d) ligating a DNA vector to the recombined cDNA or 
5 genomic DNA fragments to form expression 

constructs, 

wherein the genes contained in the cDNA or genomic DNA 
fragments are operably-associated with their native or 
exogenous regulatory regions which drive expression of the 
genes in an appropriate host cell. 

10 

17. A method for making a recombined combinatorial 

gene expression library, comprising: 

(a) preselecting or enriching for substrate cDNA or 
genomic DNA fragments obtained from a plurality 
of species of donor organisms that are 

15 displaying sequence similarities to nucleotide 

sequences encoding proteins that form a part of 
a metabolic pathway of interest; 

(b) treating the substrate cDNA or genomic DNA 
fragments to create single stranded regions; 

(c) incubating the substrate cDNA or genomic DNA 

20 fragments with target DNA fragments that encode 

proteins that form a part of the same metabolic 
pathway of interest, in a reaction comprising 
the E. coli recA protein for a time sufficient 
to ailow recombination to occur; and 

(d) introducing the recombined cDNA or genomic DNA 
25 fragments into a host; 

wherein the target DNA fragments comprises cloning vector 
sequences that allow propagation of the recombined DNA 
fragments in the host; and wherein the genes contained in the 
recombined cDNA or genomic DNA fragments are operably- 
associated with their native or exogenous regulatory regions 
3Q which will drive expression of the genes in an appropriate 
host cell. 
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18. The method of Claim 13 , 14, 15, or 16 further 
comprising introducing the library of expression constructs 
into a host cell. 

19. The method of Claim 17 wherein the host cells 
comprise a reporter regimen tailored to identify clones in 
the library that are expressing desirable metabolic pathways, 
gene product, or compounds. 

20. The method of Claim 18 wherein the host cells 
comprise a reporter regimen tailored to identify clones in 
the library that are expressing desirable metabolic pathways, 
gene product, or compounds.. 

21. A method for identifying a compound of interest 
in a gene expression library, comprising: 

(a) culturing the gene expression library of 
Claim 8; and 

(b) screening the gene expression library 
for a clone which produces the compound. 

22. A method for identifying a compound of interest 
in a gene expression library, comprising: 

(a) culturing the gene expression library of 
Claim 10; and 

(b) detecting a signal generated by the 
reporter regimen; 

thereby identifying a clone which produces the compound. 

23. a method for producing a compound of interest, 
comprising: 

(a) culturing the clone identified in claim 
21; and 

(b) recovering the compound from the culture 
of the identified clone. 
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24. A method for producing a compound of interest, 
comprising: 

(a) culturing the clone identified in claim 
22; and 

(b) recovering the compound from the culture 
of the identified clone. 
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